SML/NJ - linked list which can hold any types - functional-programming

I trying to create a datatype for linked list which can hold all types at same time i.e linked list of void* elements , the designing is to create a Node datatype which hold a record contains Value and Next .
What I did so far is -
datatype 'a anything = dummy of 'a ; (* suppose to hold any type (i.e void*) *)
datatype linkedList = Node of {Value:dummy, Next:linkedList}; (* Node contain this record *)
As you can see the above trying does not works out , but I believe my idea is clear enough , so what changes are required here to make it work ?

I am not sure if you are being forced to use a record type. Because otherwise I think it is simpler to do:
datatype 'a linkedlist = Empty | Cons of 'a * 'a linkedlist
Then you can use it somewhat like:
val jedis = Cons ("Obi-wan", Cons("Luke", Cons("Yoda", Cons("Anakin", Empty))));
I think the use of the record is a poor choice here. I cannot even think how I could represent an empty list with that approach.
-EDIT-
To answer your comment about supporting multiple types:
datatype polymorphic = N of int | S of string | B of bool
Cons(S("A"), Cons(N(5), Cons(N(6), Cons(B(true), Empty))));
Given the circumstances you may prefer SML lists instead:
S("A")::N(5)::N(6)::B(true)::[];
Which produces the list
[S "A",N 5,N 6,B true]
That is, a list of the same type (i.e. polymorphic), but this type is capable of containing different kinds of things through its multiple constructors.

FYI, if it is important that the types of your polymorphic list remain open, you can use SML's built-in exception type: exn. The exn type is open and can be extended anywhere in the program.
exception INT of int
exception STR of string
val xs = [STR "A", INT 5, INT 6] : exn list
You can case selectively on particular types as usual:
val inc_ints = List.map (fn INT i => INT (i + 1) | other => other)
And you can later extend the type without mention of its previous definition:
exception BOOL of bool
val ys = [STR "A", INT 5, INT 6, BOOL true] : exn list
Notice that you can put the construction of any exception in there (here the div-by-zero exception):
val zs = Div :: ys : exn list
That said, this (ab)use really has very few good use cases and you are generally better off with a closed sum type as explained by Edwin in the answer above.

Related

SML: Value restriction error when recursively calling quicksort

I'm writing a quicksort function for an exercise. I already know of the 5-line functional quicksort; but I wanted to improve the partition by having it scan through the list once and return a pair of lists splitting the original list in half. So I wrote:
fun partition nil = (nil, nil)
| partition (pivot :: rest) =
let
fun part (lst, pivot, (lesseq, greater)) =
case lst of
[] => (lesseq, greater)
| (h::t) =>
if h <= pivot then part (t, pivot, (h :: lesseq, greater))
else part (t, pivot, (lesseq, h :: greater))
in
part (rest, pivot, ([pivot], []))
end;
This partitions well enough. It gives me a signature val partition = fn : int list -> int list * int list. It runs as expected.
It's when I use the quicksort below that things start to break.
fun quicksort_2 nil = nil
| quicksort_2 lst =
let
val (lesseq, greater) = partition lst
in
quicksort_2 lesseq # quicksort_2 greater
end;
I can run the above function if I eliminate the recursive calls to quicksort_2; but if I put them back in (to actually go and sort the thing), it will cease to run. The signature will be incorrect as well, giving me val quicksort_2 = fn : int list -> 'a list. The warning I receive when I call the function on a list is:
Warning: type vars not generalized because of value restriction are instantiated to dummy types (X1,X2,...)
What is the problem here? I'm not using any ref variables; the type annotation I've tried doesn't seem to help...
The main issue is that you're lacking the singleton list base case for your quicksort function. It ought to be
fun quicksort [ ] = [ ]
| quicksort [x] = [x]
| quicksort xs =
let
val (l, r) = partition xs
in
quicksort l # quicksort r
end
which should then have type int list -> int list given the type of your partition. We have to add this case as otherwise you'll never hit a base case and instead recurse indefinitely.
For some more detail on why you saw the issues you were having though:
The signature will be incorrect as well, giving me val quicksort_2 = fn : int list -> 'a list
This is because the codomain of your function was never restricted to be less general than 'a list. Taking a look at the possible branches in your original implementation we can see that in the nil branch you return nil (of most general type 'a list) and in the recursive case you get two 'a lists (per our assumptions thus far) and append them, resulting in an 'a list---this is fine so your type is not further restricted.
[Value Restriction Warning]
What is the problem here? I'm not using any ref variables
The value restriction isn't really related to refs (though can often arise when using them). Instead it is the prohibition that anything polymorphic at the top level must be a value by its syntax (and thus precludes the possibility that a computation is behind a type abstractor at the top level). Here it is because given xs : int list we (ignoring the value restriction) have quicksort_2 xs : 'a list---which would otherwise be polymorphic, but is not a syntactic value. Correspondingly it is value restricted.

Finding Max/Min number in list in Ocaml

I just began to learn Ocaml recently and now just started to practice some codes.
In this case, I tried to find a maximum number in a list but it keeps return me an error message.
let max: int list -> int
= fun lst ->
match lst with
|[] -> 0
|h::[] -> h
|h::t -> let a = max t in
if h < a then h
else
a;;
Ocaml keeps saying:
Error: This expression has type int list -> int list but an expression was expected of type int.
I don't understand why a is an int list though I claimed it as a max t which is a function that makes int list into int... Thanks for yout help.
a has type int list -> int list because max in let a = max t does not refer to your function, but to the one defined in Stdlib. Stdlib (Previously called Pervasives) contains definitions used very commonly and is therefore opened for you by default.
Stdlib.max has the type 'a -> 'a -> 'a. So when you pass it an int list, the compiler infers 'a to be int list and returns a function with type int list -> int list.
Why does max not refer to your own max function? Because you've forgotten the rec keyword. As you probably know already, rec makes the function available to be called inside itself.
You should avoid shadowing Stdlib function names to avoid confusing errors like this. If you had chosen a different name, you'd simply get an error telling you that max does not exist.

Balanced tree for functional symbol table

I'm doing exercises of "Modern Compiler Implementation in ML" (Andrew Appel). One of which (ex 1.1 d) is to recommend a balanced-tree data structure for functional symbol table. Appeal mentioned such data structure should rebalance on insertion but not on lookup. Being totally new to functional programming, I found this confusing. What is key insight on this requirement?
A tree that’s rebalanced on every insertion and deletion doesn’t need to rebalance on lookup, because lookup doesn’t modify the structure. If it was balanced before a lookup, it will stay balanced during and after.
In functional languages, insertion and rebalancing can be more expensive than in a procedural one. Because you can’t alter any node in place, you replace a node by creating a new node, then replacing its parent with a new node whose children are the new daughter and the unaltered older daughter, and then replace the grandparent node with one whose children are the new parent and her older sister, and so on up. You finish when you create a new root node for the updated tree and garbage-collect all the nodes you replaced. However, some tree structures have the desirable property that they need to replace no more than O(log N) nodes of the tree on an insertion and can re-use the rest. This means that the rotation of a red-black tree (for example) has not much more overhead than an unbalanced insertion.
Also, you will typically need to query a symbol table much more often than you update it. It therefore becomes less tempting to try to make insertion faster: if you’re inserting, you might as well rebalance.
The question of which self-balancing tree structure is best for a functional language has been asked here, more than once.
Since Davislor already answered your question extensively, here are mostly some implementation hints. I would add that choice of data structure for your symbol table is probably not relevant for a toy compiler. Compilation time only starts to become an issue when you compiler is used on a lot of code and the code is recompiled often.
Sticking to a O(n) insert/lookup data structure is fine in practice until it isn't.
Signature-wise, all you want is a key-value mapping, insert, and lookup:
signature SymTab =
sig
type id
type value
type symtab
val empty : symtab
val insert : id -> value -> symtab -> symtab
val lookup : id -> symtab -> value option
end
A simple O(n) implementation with lists might be:
structure ListSymTab : SymTab =
struct
type id = string
type value = int
type symtab = (id * value) list
val empty = []
fun insert id value [] = [(id, value)]
| insert id value ((id',value')::symtab) =
if id = id'
then (id,value)::symtab
else (id',value')::insert id value symtab
fun lookup _ [] = NONE
| lookup id ((id',value)::symtab) =
if id = id' then SOME value else lookup id symtab
end
You might use it like:
- ListSymTab.lookup "hello" (ListSymTab.insert "hello" 42 ListSymTab.empty);
> val it = SOME 42 : int option
Then again, maybe your symbol table doesn't map strings to integers, or you may have one symbol table for variables and one for functions.
You could parameterise the id/value types using a functor:
functor ListSymTabFn (X : sig
eqtype id
type value
end) : SymTab =
struct
type id = X.id
type value = X.value
(* The rest is the same as ListSymTab. *)
end
And you might use it like:
- structure ListSymTab = ListSymTabFn(struct type id = string type value = int end);
- ListSymTab.lookup "world" (ListSymTab.insert "hello" 42 ListSymTab.empty);
> val it = NONE : int option
All you need for a list-based symbol table is that the identifiers/symbols can be compared for equality. For your balanced-tree symbol table, you need identifiers/symbols to be orderable.
Instead of implementing balanced trees from scratch, look e.g. at SML/NJ's RedBlackMapFn:
To create a structure implementing maps (dictionaries) over a type T [...]:
structure MapT = RedBlackMapFn (struct
type ord_key = T
val compare = compareT
end)
Try this example with T as string and compare as String.compare:
$ sml
Standard ML of New Jersey v110.76 [built: Sun Jun 29 03:29:51 2014]
- structure MapS = RedBlackMapFn (struct
type ord_key = string
val compare = String.compare
end);
[autoloading]
[library $SMLNJ-BASIS/basis.cm is stable]
[library $SMLNJ-LIB/Util/smlnj-lib.cm is stable]
[autoloading done]
structure MapS : ORD_MAP?
- open MapS;
...
Opening the structure is an easy way to explore the available functions and their types.
We can then create a similar functor to ListSymTabFn, but one that takes an additional compare function:
functor RedBlackSymTabFn (X : sig
type id
type value
val compare : id * id -> order
end) : SymTab =
struct
type id = X.id
type value = X.value
structure SymTabX = RedBlackMapFn (struct
type ord_key = X.id
val compare = X.compare
end)
(* The 'a map type inside SymTabX maps X.id to anything. *)
(* We are, however, only interested in mapping to values. *)
type symtab = value SymTabX.map
(* Use other stuff in SymTabT for empty, insert, lookup. *)
end
Finally, you can use this as your symbol table:
structure SymTab = RedBlackSymTabFn(struct
type id = string
type value = int
val compare = String.compare
end);

Function with different argument types

I read about polymorphism in function and saw this example
fun len nil = 0
| len rest = 1 + len (tl rest)
All the other examples dealt with nil arg too.
I wanted to check the polymorphism concept on other types, like
fun func (a : int) : int = 1
| func (b : string) : int = 2 ;
and got the follow error
stdIn:1.6-2.33 Error: parameter or result constraints of clauses don't agree
[tycon mismatch]
this clause: string -> int
previous clauses: int -> int
in declaration:
func = (fn a : int => 1: int
| b : string => 2: int)
What is the mistake in the above function? Is it legal at all?
Subtype Polymorphism:
In a programming languages like Java, C# o C++ you have a set of subtyping rules that govern polymorphism. For instance, in object-oriented programming languages if you have a type A that is a supertype of a type B; then wherever A appears you can pass a B, right?
For instance, if you have a type Mammal, and Dog and Cat were subtypes of Mammal, then wherever Mammal appears you could pass a Dog or a Cat.
You can achive the same concept in SML using datatypes and constructors. For instance:
datatype mammal = Dog of String | Cat of String
Then if you have a function that receives a mammal, like:
fun walk(m: mammal) = ...
Then you could pass a Dog or a Cat, because they are constructors for mammals. For instance:
walk(Dog("Fido"));
walk(Cat("Zoe"));
So this is the way SML achieves something similar to what we know as subtype polymorphism in object-oriented languajes.
Ad-hoc Polymorphysm:
Coercions
The actual point of confusion could be the fact that languages like Java, C# and C++ typically have automatic coercions of types. For instance, in Java an int can be automatically coerced to a long, and a float to a double. As such, I could have a function that accepts doubles and I could pass integers. Some call these automatic coercions ad-hoc polymorphism.
Such form of polymorphism does not exist in SML. In those cases you are forced to manually coerced or convert one type to another.
fun calc(r: real) = r
You cannot call it with an integer, to do so you must convert it first:
calc(Real.fromInt(10));
So, as you can see, there is no ad-hoc polymorphism of this kind in SML. You must do castings/conversions/coercions manually.
Function Overloading
Another form of ad-hoc polymorphism is what we call method overloading in languages like Java, C# and C++. Again, there is no such thing in SML. You may define two different functions with different names, but no the same function (same name) receiving different parameters or parameter types.
This concept of function or method overloading must not be confused with what you use in your examples, which is simply pattern matching for functions. That is syntantic sugar for something like this:
fun len xs =
if null xs then 0
else 1 + len(tl xs)
Parametric Polymorphism:
Finally, SML offers parametric polymorphism, very similar to what generics do in Java and C# and I understand that somewhat similar to templates in C++.
So, for instance, you could have a type like
datatype 'a list = Empty | Cons of 'a * 'a list
In a type like this 'a represents any type. Therefore this is a polymorphic type. As such, I could use the same type to define a list of integers, or a list of strings:
val listOfString = Cons("Obi-wan", Empty);
Or a list of integers
val numbers = Cons(1, Empty);
Or a list of mammals:
val pets = Cons(Cat("Milo", Cons(Dog("Bentley"), Empty)));
This is the same thing you could do with SML lists, which also have parametric polymorphism:
You could define lists of many "different types":
val listOfString = "Yoda"::"Anakin"::"Luke"::[]
val listOfIntegers 1::2::3::4::[]
val listOfMammals = Cat("Misingo")::Dog("Fido")::Cat("Dexter")::Dog("Tank")::[]
In the same sense, we could have parametric polymorphism in functions, like in the following example where we have an identity function:
fun id x = x
The type of x is 'a, which basically means you can substitute it for any type you want, like
id("hello");
id(35);
id(Dog("Diesel"));
id(Cat("Milo"));
So, as you can see, combining all these different forms of polymorphism you should be able to achieve the same things you do in other statically typed languages.
No, it's not legal. In SML, every function has a type. The type of the len function you gave as an example is
fn : 'a list -> int
That is, it takes a list of any type and returns an integer. The function you're trying to make takes and integer or a string, and returns an integer, and that's not legal in the SML type system. The usual workaround is to make a wrapper type:
datatype wrapper = I of int | S of string
fun func (I a) = 1
| func (S a) = 2
That function has type
fn : wrapper -> int
Where wrapper can contain either an integer or a string.

What is 'Pattern Matching' in functional languages?

I'm reading about functional programming and I've noticed that Pattern Matching is mentioned in many articles as one of the core features of functional languages.
Can someone explain for a Java/C++/JavaScript developer what does it mean?
Understanding pattern matching requires explaining three parts:
Algebraic data types.
What pattern matching is
Why its awesome.
Algebraic data types in a nutshell
ML-like functional languages allow you define simple data types called "disjoint unions" or "algebraic data types". These data structures are simple containers, and can be recursively defined. For example:
type 'a list =
| Nil
| Cons of 'a * 'a list
defines a stack-like data structure. Think of it as equivalent to this C#:
public abstract class List<T>
{
public class Nil : List<T> { }
public class Cons : List<T>
{
public readonly T Item1;
public readonly List<T> Item2;
public Cons(T item1, List<T> item2)
{
this.Item1 = item1;
this.Item2 = item2;
}
}
}
So, the Cons and Nil identifiers define simple a simple class, where the of x * y * z * ... defines a constructor and some data types. The parameters to the constructor are unnamed, they're identified by position and data type.
You create instances of your a list class as such:
let x = Cons(1, Cons(2, Cons(3, Cons(4, Nil))))
Which is the same as:
Stack<int> x = new Cons(1, new Cons(2, new Cons(3, new Cons(4, new Nil()))));
Pattern matching in a nutshell
Pattern matching is a kind of type-testing. So let's say we created a stack object like the one above, we can implement methods to peek and pop the stack as follows:
let peek s =
match s with
| Cons(hd, tl) -> hd
| Nil -> failwith "Empty stack"
let pop s =
match s with
| Cons(hd, tl) -> tl
| Nil -> failwith "Empty stack"
The methods above are equivalent (although not implemented as such) to the following C#:
public static T Peek<T>(Stack<T> s)
{
if (s is Stack<T>.Cons)
{
T hd = ((Stack<T>.Cons)s).Item1;
Stack<T> tl = ((Stack<T>.Cons)s).Item2;
return hd;
}
else if (s is Stack<T>.Nil)
throw new Exception("Empty stack");
else
throw new MatchFailureException();
}
public static Stack<T> Pop<T>(Stack<T> s)
{
if (s is Stack<T>.Cons)
{
T hd = ((Stack<T>.Cons)s).Item1;
Stack<T> tl = ((Stack<T>.Cons)s).Item2;
return tl;
}
else if (s is Stack<T>.Nil)
throw new Exception("Empty stack");
else
throw new MatchFailureException();
}
(Almost always, ML languages implement pattern matching without run-time type-tests or casts, so the C# code is somewhat deceptive. Let's brush implementation details aside with some hand-waving please :) )
Data structure decomposition in a nutshell
Ok, let's go back to the peek method:
let peek s =
match s with
| Cons(hd, tl) -> hd
| Nil -> failwith "Empty stack"
The trick is understanding that the hd and tl identifiers are variables (errm... since they're immutable, they're not really "variables", but "values" ;) ). If s has the type Cons, then we're going to pull out its values out of the constructor and bind them to variables named hd and tl.
Pattern matching is useful because it lets us decompose a data structure by its shape instead of its contents. So imagine if we define a binary tree as follows:
type 'a tree =
| Node of 'a tree * 'a * 'a tree
| Nil
We can define some tree rotations as follows:
let rotateLeft = function
| Node(a, p, Node(b, q, c)) -> Node(Node(a, p, b), q, c)
| x -> x
let rotateRight = function
| Node(Node(a, p, b), q, c) -> Node(a, p, Node(b, q, c))
| x -> x
(The let rotateRight = function constructor is syntax sugar for let rotateRight s = match s with ....)
So in addition to binding data structure to variables, we can also drill down into it. Let's say we have a node let x = Node(Nil, 1, Nil). If we call rotateLeft x, we test x against the first pattern, which fails to match because the right child has type Nil instead of Node. It'll move to the next pattern, x -> x, which will match any input and return it unmodified.
For comparison, we'd write the methods above in C# as:
public abstract class Tree<T>
{
public abstract U Match<U>(Func<U> nilFunc, Func<Tree<T>, T, Tree<T>, U> nodeFunc);
public class Nil : Tree<T>
{
public override U Match<U>(Func<U> nilFunc, Func<Tree<T>, T, Tree<T>, U> nodeFunc)
{
return nilFunc();
}
}
public class Node : Tree<T>
{
readonly Tree<T> Left;
readonly T Value;
readonly Tree<T> Right;
public Node(Tree<T> left, T value, Tree<T> right)
{
this.Left = left;
this.Value = value;
this.Right = right;
}
public override U Match<U>(Func<U> nilFunc, Func<Tree<T>, T, Tree<T>, U> nodeFunc)
{
return nodeFunc(Left, Value, Right);
}
}
public static Tree<T> RotateLeft(Tree<T> t)
{
return t.Match(
() => t,
(l, x, r) => r.Match(
() => t,
(rl, rx, rr) => new Node(new Node(l, x, rl), rx, rr))));
}
public static Tree<T> RotateRight(Tree<T> t)
{
return t.Match(
() => t,
(l, x, r) => l.Match(
() => t,
(ll, lx, lr) => new Node(ll, lx, new Node(lr, x, r))));
}
}
For seriously.
Pattern matching is awesome
You can implement something similar to pattern matching in C# using the visitor pattern, but its not nearly as flexible because you can't effectively decompose complex data structures. Moreover, if you are using pattern matching, the compiler will tell you if you left out a case. How awesome is that?
Think about how you'd implement similar functionality in C# or languages without pattern matching. Think about how you'd do it without test-tests and casts at runtime. Its certainly not hard, just cumbersome and bulky. And you don't have the compiler checking to make sure you've covered every case.
So pattern matching helps you decompose and navigate data structures in a very convenient, compact syntax, it enables the compiler to check the logic of your code, at least a little bit. It really is a killer feature.
Short answer: Pattern matching arises because functional languages treat the equals sign as an assertion of equivalence instead of assignment.
Long answer: Pattern matching is a form of dispatch based on the “shape” of the value that it's given. In a functional language, the datatypes that you define are usually what are known as discriminated unions or algebraic data types. For instance, what's a (linked) list? A linked list List of things of some type a is either the empty list Nil or some element of type a Consed onto a List a (a list of as). In Haskell (the functional language I'm most familiar with), we write this
data List a = Nil
| Cons a (List a)
All discriminated unions are defined this way: a single type has a fixed number of different ways to create it; the creators, like Nil and Cons here, are called constructors. This means that a value of the type List a could have been created with two different constructors—it could have two different shapes. So suppose we want to write a head function to get the first element of the list. In Haskell, we would write this as
-- `head` is a function from a `List a` to an `a`.
head :: List a -> a
-- An empty list has no first item, so we raise an error.
head Nil = error "empty list"
-- If we are given a `Cons`, we only want the first part; that's the list's head.
head (Cons h _) = h
Since List a values can be of two different kinds, we need to handle each one separately; this is the pattern matching. In head x, if x matches the pattern Nil, then we run the first case; if it matches the pattern Cons h _, we run the second.
Short answer, explained: I think one of the best ways to think about this behavior is by changing how you think of the equals sign. In the curly-bracket languages, by and large, = denotes assignment: a = b means “make a into b.” In a lot of functional languages, however, = denotes an assertion of equality: let Cons a (Cons b Nil) = frob x asserts that the thing on the left, Cons a (Cons b Nil), is equivalent to the thing on the right, frob x; in addition, all variables used on the left become visible. This is also what's happening with function arguments: we assert that the first argument looks like Nil, and if it doesn't, we keep checking.
It means that instead of writing
double f(int x, int y) {
if (y == 0) {
if (x == 0)
return NaN;
else if (x > 0)
return Infinity;
else
return -Infinity;
} else
return (double)x / y;
}
You can write
f(0, 0) = NaN;
f(x, 0) | x > 0 = Infinity;
| else = -Infinity;
f(x, y) = (double)x / y;
Hey, C++ supports pattern matching too.
static const int PositiveInfinity = -1;
static const int NegativeInfinity = -2;
static const int NaN = -3;
template <int x, int y> struct Divide {
enum { value = x / y };
};
template <bool x_gt_0> struct aux { enum { value = PositiveInfinity }; };
template <> struct aux<false> { enum { value = NegativeInfinity }; };
template <int x> struct Divide<x, 0> {
enum { value = aux<(x>0)>::value };
};
template <> struct Divide<0, 0> {
enum { value = NaN };
};
#include <cstdio>
int main () {
printf("%d %d %d %d\n", Divide<7,2>::value, Divide<1,0>::value, Divide<0,0>::value, Divide<-1,0>::value);
return 0;
};
Pattern matching is sort of like overloaded methods on steroids. The simplest case would be the same roughly the same as what you seen in java, arguments are a list of types with names. The correct method to call is based on the arguments passed in, and it doubles as an assignment of those arguments to the parameter name.
Patterns just go a step further, and can destructure the arguments passed in even further. It can also potentially use guards to actually match based on the value of the argument. To demonstrate, I'll pretend like JavaScript had pattern matching.
function foo(a,b,c){} //no pattern matching, just a list of arguments
function foo2([a],{prop1:d,prop2:e}, 35){} //invented pattern matching in JavaScript
In foo2, it expects a to be an array, it breaks apart the second argument, expecting an object with two props (prop1,prop2) and assigns the values of those properties to variables d and e, and then expects the third argument to be 35.
Unlike in JavaScript, languages with pattern matching usually allow multiple functions with the same name, but different patterns. In this way it is like method overloading. I'll give an example in erlang:
fibo(0) -> 0 ;
fibo(1) -> 1 ;
fibo(N) when N > 0 -> fibo(N-1) + fibo(N-2) .
Blur your eyes a little and you can imagine this in javascript. Something like this maybe:
function fibo(0){return 0;}
function fibo(1){return 1;}
function fibo(N) when N > 0 {return fibo(N-1) + fibo(N-2);}
Point being that when you call fibo, the implementation it uses is based on the arguments, but where Java is limited to types as the only means of overloading, pattern matching can do more.
Beyond function overloading as shown here, the same principle can be applied other places, such as case statements or destructuring assingments. JavaScript even has this in 1.7.
Pattern matching allows you to match a value (or an object) against some patterns to select a branch of the code. From the C++ point of view, it may sound a bit similar to the switch statement. In functional languages, pattern matching can be used for matching on standard primitive values such as integers. However, it is more useful for composed types.
First, let's demonstrate pattern matching on primitive values (using extended pseudo-C++ switch):
switch(num) {
case 1:
// runs this when num == 1
case n when n > 10:
// runs this when num > 10
case _:
// runs this for all other cases (underscore means 'match all')
}
The second use deals with functional data types such as tuples (which allow you to store multiple objects in a single value) and discriminated unions which allow you to create a type that can contain one of several options. This sounds a bit like enum except that each label can also carry some values. In a pseudo-C++ syntax:
enum Shape {
Rectangle of { int left, int top, int width, int height }
Circle of { int x, int y, int radius }
}
A value of type Shape can now contain either Rectangle with all the coordinates or a Circle with the center and the radius. Pattern matching allows you to write a function for working with the Shape type:
switch(shape) {
case Rectangle(l, t, w, h):
// declares variables l, t, w, h and assigns properties
// of the rectangle value to the new variables
case Circle(x, y, r):
// this branch is run for circles (properties are assigned to variables)
}
Finally, you can also use nested patterns that combine both of the features. For example, you could use Circle(0, 0, radius) to match for all shapes that have the center in the point [0, 0] and have any radius (the value of the radius will be assigned to the new variable radius).
This may sound a bit unfamiliar from the C++ point of view, but I hope that my pseudo-C++ make the explanation clear. Functional programming is based on quite different concepts, so it makes better sense in a functional language!
Pattern matching is where the interpreter for your language will pick a particular function based on the structure and content of the arguments you give it.
It is not only a functional language feature but is available for many different languages.
The first time I came across the idea was when I learned prolog where it is really central to the language.
e.g.
last([LastItem], LastItem).
last([Head|Tail], LastItem) :-
last(Tail, LastItem).
The above code will give the last item of a list. The input arg is the first and the result is the second.
If there is only one item in the list the interpreter will pick the first version and the second argument will be set to equal the first i.e. a value will be assigned to the result.
If the list has both a head and a tail the interpreter will pick the second version and recurse until it there is only one item left in the list.
For many people, picking up a new concept is easier if some easy examples are provided, so here we go:
Let's say you have a list of three integers, and wanted to add the first and the third element. Without pattern matching, you could do it like this (examples in Haskell):
Prelude> let is = [1,2,3]
Prelude> head is + is !! 2
4
Now, although this is a toy example, imagine we would like to bind the first and third integer to variables and sum them:
addFirstAndThird is =
let first = head is
third = is !! 3
in first + third
This extraction of values from a data structure is what pattern matching does. You basically "mirror" the structure of something, giving variables to bind for the places of interest:
addFirstAndThird [first,_,third] = first + third
When you call this function with [1,2,3] as its argument, [1,2,3] will be unified with [first,_,third], binding first to 1, third to 3 and discarding 2 (_ is a placeholder for things you don't care about).
Now, if you only wanted to match lists with 2 as the second element, you can do it like this:
addFirstAndThird [first,2,third] = first + third
This will only work for lists with 2 as their second element and throw an exception otherwise, because no definition for addFirstAndThird is given for non-matching lists.
Until now, we used pattern matching only for destructuring binding. Above that, you can give multiple definitions of the same function, where the first matching definition is used, thus, pattern matching is a little like "a switch statement on stereoids":
addFirstAndThird [first,2,third] = first + third
addFirstAndThird _ = 0
addFirstAndThird will happily add the first and third element of lists with 2 as their second element, and otherwise "fall through" and "return" 0. This "switch-like" functionality can not only be used in function definitions, e.g.:
Prelude> case [1,3,3] of [a,2,c] -> a+c; _ -> 0
0
Prelude> case [1,2,3] of [a,2,c] -> a+c; _ -> 0
4
Further, it is not restricted to lists, but can be used with other types as well, for example matching the Just and Nothing value constructors of the Maybe type in order to "unwrap" the value:
Prelude> case (Just 1) of (Just x) -> succ x; Nothing -> 0
2
Prelude> case Nothing of (Just x) -> succ x; Nothing -> 0
0
Sure, those were mere toy examples, and I did not even try to give a formal or exhaustive explanation, but they should suffice to grasp the basic concept.
You should start with the Wikipedia page that gives a pretty good explanation. Then, read the relevant chapter of the Haskell wikibook.
This is a nice definition from the above wikibook:
So pattern matching is a way of
assigning names to things (or binding
those names to those things), and
possibly breaking down expressions
into subexpressions at the same time
(as we did with the list in the
definition of map).
Here is a really short example that shows pattern matching usefulness:
Let's say you want to sort up an element in a list:
["Venice","Paris","New York","Amsterdam"]
to (I've sorted up "New York")
["Venice","New York","Paris","Amsterdam"]
in an more imperative language you would write:
function up(city, cities){
for(var i = 0; i < cities.length; i++){
if(cities[i] === city && i > 0){
var prev = cities[i-1];
cities[i-1] = city;
cities[i] = prev;
}
}
return cities;
}
In a functional language you would instead write:
let up list value =
match list with
| [] -> []
| previous::current::tail when current = value -> current::previous::tail
| current::tail -> current::(up tail value)
As you can see the pattern matched solution has less noise, you can clearly see what are the different cases and how easy it's to travel and de-structure our list.
I've written a more detailed blog post about it here.

Resources