I would like to create a function remove_duplicates that takes a list of any type (e.g. can be an int list or a bool list or a int list list or a whatever list) and returns the same list without duplicates, is this possible in Standard ML?
Is a function that takes a list of any type and returns the list without duplicates possible in Standard ML?
No.
To determine if one element is a duplicate of another, their values must be comparable. "Any type", or 'a in Standard ML, is not comparable for equality. So while you cannot have a val nub : 'a list -> 'a list that removes duplicates, here are four alternative options:
What #qouify suggests, the built-in equality type ''a, so anything you can use = on:
val nub : ''a list -> ''a list
What #kopecs suggests, a function that takes an equality operator as parameter:
val nub : ('a * 'a -> bool) -> 'a list -> 'a list
Which is a generalisation of 1., since here, nub op= : ''a list -> ''a list. This solution is kind of neat since it lets you remove not only duplicates, but also redundant representatives of arbitrary equivalence classes, e.g. nub (fn (x, y) => (x mod 3) = (y mod 3)) will only preserve integers that are distinct modulo 3. But its complexity is O(n²). (-_- )ノ⌒┻━┻
Because it is O(n²), nub is considered harmful.
As the article also suggests, the alternative is to use ordering rather than equality to reduce the complexity to O(n log n). While in Haskell this means only changing the type class constraint:
nub :: Eq a => [a] -> [a]
nubOrd :: Ord a => [a] -> [a]
and adjusting the algorithm, it gets a little more complicated to express this constraint in SML. While we do have ''a to represent Eq a => a (that we can use = on our input), we don't have a similar special syntax support for elements that can be compared as less/equal/greater, and we also don't have type classes. We do have the following built-in order type:
datatype order = LESS | EQUAL | GREATER
so if you like kopecs' solution, a variation with a better running time is:
val nubOrd : ('a * 'a -> order) -> 'a list -> 'a list
since it can use something like a mathematical set of previously seen elements, implemented using some kind of balanced search tree; n inserts each of complexity O(log n) takes a total of O(n log n) steps.
One of SML's winner features is its composable module system. Instead of using parametric polymorphism and feeding the function nubOrd with an order comparison function, you can create a module that takes another module as a parameter (a functor).
First, let's define a signature for modules that represent ordering of types:
signature ORD =
sig
type t
val compare : t * t -> order
end
(Notice that there isn't a ' in front of t.)
This means that anyone could make a struct ... end : ORD by specifying a t and a corresponding compare function for ts. Many built-in types have pre-defined compare functions: int has Int.compare and real has Real.compare.
Then, define a tree-based set data structure; I've used a binary search tree, and I've skipped most functions but the ones strictly necessary to perform this feat. Ideally you might extend the interface and use a better tree type, such as a self-balancing tree. (Unfortunately, since you've tagged this Q&A both as SML/NJ and Moscow ML, I wasn't sure which module to use, since they extend the standard library in different ways when it comes to balanced trees.)
functor TreeSet (X : ORD) =
struct
type t = X.t
datatype 'a tree = Leaf | Branch of 'a tree * 'a * 'a tree
val empty = Leaf
fun member (x, Leaf) = false
| member (x, Branch (left, y, right)) =
case X.compare (x, y) of
EQUAL => true
| LESS => member (x, left)
| GREATER => member (x, right)
fun insert (x, Leaf) = Branch (Leaf, x, Leaf)
| insert (x, Branch (left, y, right)) =
case X.compare (x, y) of
EQUAL => Branch (left, y, right)
| LESS => Branch (insert (x, left), y, right)
| GREATER => Branch (left, y, insert (x, right))
end
Lastly, the ListUtils functor contains the nubOrd utility function. The functor takes a structure X : ORD just like the TreeSet functor does. It creates an XSet structure by specialising the TreeSet functor using the same ordering module. It then uses this XSet to efficiently keep a record of the elements it has seen before.
functor ListUtils (X : ORD) =
struct
structure XSet = TreeSet(X)
fun nubOrd (xs : X.t list) =
let
val init = ([], XSet.empty)
fun go (x, (ys, seen)) =
if XSet.member (x, seen)
then (ys, seen)
else (x::ys, XSet.insert (x, seen))
in rev (#1 (foldl go init xs))
end
end
Using this functor to remove duplicates in an int list:
structure IntListUtils = ListUtils(struct
type t = int
val compare = Int.compare
end)
val example = IntListUtils.nubOrd [1,1,2,1,3,1,2,1,3,3,2,1,4,3,2,1,5,4,3,2,1]
(* [1, 2, 3, 4, 5] *)
The purpose of all that mess is a nubOrd without a direct extra function parameter.
Unfortunately, in order for this to extend to int list list, you need to create the compare function for that type, since unlike Int.compare, there isn't a generic one available in the standard library either. (This is where Haskell is a lot more ergonomic.)
So you might go and write a generic, lexicographical list compare function: If you know how to compare two elements of type 'a, you know how to compare two lists of those, no matter what the element type is:
fun listCompare _ ([], []) = EQUAL (* empty lists are equal *)
| listCompare _ ([], ys) = LESS (* empty is always smaller than non-empty *)
| listCompare _ (xs, []) = GREATER (* empty is always smaller than non-empty *)
| listCompare compare (x::xs, y::ys) =
case compare (x, y) of
EQUAL => listCompare compare (xs, ys)
| LESS => LESS
| GREATER => GREATER
And now,
structure IntListListUtils = ListUtils(struct
type t = int list
val compare = listCompare Int.compare
end)
val example2 = IntListListUtils.nubOrd [[1,2,3],[1,2,3,2],[1,2,3]]
(* [[1,2,3],[1,2,3,2]] *)
So even though [1,2,3] and [1,2,3,2] contain duplicates, they are not EQUAL when you compare them. But the third element is EQUAL to the first one, and so it gets removed as a duplicate.
Some last observations:
You may consider that even though each compare is only run O(log n) times, a single compare for some complex data structure, such as a (whatever * int) list list may still be expensive. So another improvement you can make here is to cache the result of every compare output, which is actually what Haskell's nubOrdOn operator does. ┳━┳ ヽ(ಠل͜ಠ)ノ
The functor approach is used extensively in Jane Street's OCaml Base library. The quick solution was to pass around an 'a * 'a -> order function around every single time you nub something. One moral, though, is that while the module system does add verbosity, if you provide enough of this machinery in a standard library, it will become quite convenient.
If you think the improvement from O(n²) to O(n log n) is not enough, consider Fritz Henglein's Generic top-down discrimination for sorting and partitioning in linear time (2012) and Edward Kmett's Haskell discrimination package's nub for a O(n) nub.
Yes. This is possible in SML through use of parametric polymorphism. You want a function of most general type 'a list -> 'a list where 'a is a type variable (i.e., variable that ranges over types) that would be read as alpha.
For some more concrete examples of how you might apply this (the explicit type variable after fun is optional):
fun 'a id (x : 'a) : 'a = x
Here we have the identity function with type 'a -> 'a.
We can declare similar functions with some degree of specialisation of the types, for instance
fun map _ [] = []
| map f (x::xs) = f x :: map f xs
Where map has most general type ('a -> 'b) -> 'a list -> 'b list, i.e, takes two curried arguments, one with some function type and another with some list type (agrees with function's domain) and returns a new list with type given by the codomain of the function.
For your specific problem you'll probably also want to take an equality function in order to determine what is a "duplicate" or you'll probably restrict yourself to "equality types" (types that can be compared with op=, represented by type variables with two leading apostrophes, e.g., ''a).
Yes sml provides polymorphism to do such things. In many cases you actually don't care for the type of the item in your lists (or other structures). For instance this function checks (already present in the List structure) for the existence of an item in a list:
fun exists _ [] = false
| exists x (y :: l) = x = y orelse exists x l
Such function works for any type of list as long as the equal operator is defined for this type (such type is called an equality type). You can do the same for remove_duplicates. In order to work with list of items of non equality types you will have to give remove_duplicates an additional function that checks if two items are equal.
How can I define constant sets in Isabelle ? For example something like {1,2,3} (to give it a more interesting twist with 1,2,3 being reals), or {x \in N: x < m}, where m is some fixed number - or, perhaps more difficult, the set {N,R,C}, where N are the naturals numbers, R the real and C the complex ones.
I imagine in all cases it has to be something like
definition a_set :: set
where "a_set ⟷ ??? "
but various attempts of replacing ??? with something correct failed.
Somehow all the tutorial I found talk about defining functions on sets - but I couldn't find simple examples like these to learn from.
The definition command defines a constant. It takes a single equation with the symbol to be defined on the left-hand side, e.g. definition "x = 5" or definition "f = (λx. x + 1)". For increased readability, function arguments can appear on the left-hand side of the equation, e.g. f x = x + 1.
The problem is that you're using ⟷ (the ‘if and only if’ operator, i.e. equality of Booleans). When you have Booleans, it's a good idea to use this instead of simply = because it saves parentheses: You can write ‘P x ⟷ x = 2 ∨ x = 5’ instead of ‘P x = (x = 2 ∨ x = 5)’. (The = operator binds more strongly than the logical connectives ∨ and ∧; ⟷, on the other hand, binds more weakly)
⟷ is just another way of writing = specialised to Booleans. That means that if you're defining something that doesn't return a Boolean, ⟷ is not going to work. Just use regular =:
definition A :: "real set" where
"A = {1, 2, 3}"
Or, for your other example:
definition B :: "complex set set" where
"B = {ℕ, ℝ, UNIV}"
Note that HOL is a typed logic; that means that you cannot just do
definition a_set :: set
because there is no type of all sets. There is only a type of all sets whose elements have a specific type, e.g. nat set or (real ⇒ real) set or indeed nat set set. Just saying set will give you an error message ‘Could not parse type’ because set is a type constructor that expects one type argument and you have given it none.
Regarding the set {ℕ, ℝ, ℂ}, this is the constant B I defined above as an example. There is no ℂ in Isabelle because that is just UNIV :: complex set. (UNIV being the set of all values of the type in question). Note that the ℕ and ℝ in that case are the set of natural and real numbers as a subset of the complex numbers.
I have doubt about what's the real meaning of "order" when we say "high order" functions? E.g., I've got an embedded function call like:
f.g.h
So is it called a "3 order" function?
Is "High order" function a concept of static function accumulation? Then when I have a recursive function f, and in runtime its calling stack is like f.f.f.f. can we say f is high order function?
Thanks a lot.
The order is basically the nesting level of arrows in the type.
This lecture slide on Functional Programming defines
The order of data
Order 0: Non function data
Order 1: Functions with domain and range of order 0
Order 2: Functions with domain and range of order 1
Order k: Functions with domain and range of order k-1
So basically a zeroth-order function is no function, a first-order normal function that only operates on data, and everything else is a higher-order function.
(These slides seem to have an off-by-one error)
Let's get to some (Haskell) examples:
-- no arrow, clearly some plain data
x :: Int
x = 0
-- one arrow, so it's a first-order function:
add2 :: Int -> Int
add2 = (+ 2)
-- still one arrow only:
add :: (Int, Int) -> Int
add = uncurry (+)
-- and this is a first-order function as well:
add4 :: Int -> Int
add4 = add2 . add2
As you can see, it doesn't matter whether you are using function composition (a higher-order function) to define your functions, only their result type matters. Therefore your f.g.h and f.f.f.f examples are only first-order functions (given that f is one).
Simple examples of higher-order functions are multivariate functions:
-- two arrows! Looks like a second-order function
plus :: Int -> Int -> Int
plus = (+)
The type of the curried function is actually Int -> (Int -> Int) where we can clearly see that it's a function that has a first-order function as the result, so it is of order 2. The lower order of the input (0) doesn't really matter.
We can see the same in a more interesting example, function composition:
compose :: ((b -> c), (a -> b)) -> a -> c
compose (f, g) x = f (g x)
Here both of the parameters and the result are each a first-order function, so compose is of order 2.
Another example is the fixpoint combinator fix :: (a -> a) -> a which does have a first-order function as the input and a zeroth-order result, making it second order overall.
And the curried composition operator as we know it
(.) :: (b -> c) -> ((a -> b) -> (a -> c))
would even be considered a third-order function.
An example (javaish)
Function<Double, Double> oneOf(Function<Double, Double>... fs) {
return fs[new Random().nextInt(fs.length)];
}
double y = oneOf(cos, sin, atan).applyTo(1.23);
This is order 2 for two reasons: parameter type and especially the result type are functions.
I am learning OCaml. I know that OCaml provides us with both imperative style of programming and functional programming.
I came across this code as part of my course to compute the n'th Fibonacci number in OCaml
let memoise f =
let table = ref []
in
let rec find tab n =
match tab with
| [] ->
let v = (f n)
in
table := (n, v) :: !table;
v
| (n', v) :: t ->
if n' = n then v else (find t n)
in
fun n -> find !table n
let fibonacci2 = memoise fibonacci1
Where the function fibonacci1 is implemented in the standard way as follows:
let rec fibonacci1 n =
match n with
| 0 | 1 -> 1
| _ -> (fibonacci1 (n - 1)) + (fibonacci1 (n - 2))
Now my question is that how are we achieving memoisation in fibonacci2. table has been defined inside the function fibonacci2 and thus, my logic dictates that after the function finishes computation, the list table should get lost and after each call the table will get built again and again.
I ran some a simple test where I called the function fibonacci 35 twice in the OCaml REPL and the second function call returned the answer significantly faster than the first call to the function (contrary to my expectations).
I though that this might be possible if declaring a variable using ref gives it a global scope by default.
So I tried this
let f y = let x = ref 5 in y;;
print_int !x;;
But this gave me an error saying that the value of x is unbounded.
Why does this behave this way?
The function memoise returns a value, call it f. (f happens to be a function). Part of that value is the table. Every time you call memoise you're going to get a different value (with a different table).
In the example, the returned value f is given the name fibonacci2. So, the thing named fibonacci2 has a table inside it that can be used by the function f.
There is no global scope by default, that would be a huge mess. At any rate, this is a question of lifetime not of scope. Lifetimes in OCaml last as long as an object can be reached somehow. In the case of the table, it can be reached through the returned function, and hence it lasts as long as the function does.
In your second example you are testing the scope (not the lifetime) of x, and indeed the scope of x is restricted to the subexpresssion of its let. (I.e., it is meaningful only in the expression y, where it's not used.) In the original code, all the uses of table are within its let, hence there's no problem.
Although references are a little tricky, the underlying semantics of OCaml come from lambda calculus, and are extremely clean. That's why it's such a delight to code in OCaml (IMHO).
I am trying to write a function that returns the index of the passed value v in a given list x; -1 if not found. My attempt at the solution:
let rec index (x, v) =
let i = 0 in
match x with
[] -> -1
| (curr::rest) -> if(curr == v) then
i
else
succ i; (* i++ *)
index(rest, v)
;;
This is obviously wrong to me (it will return -1 every time) because it redefines i at each pass. I have some obscure ways of doing it with separate functions in my head, none which I can write down at the moment. I know this is a common pattern in all programming, so my question is, what's the best way to do this in OCaml?
Mutation is not a common way to solve problems in OCaml. For this task, you should use recursion and accumulate results by changing the index i on certain conditions:
let index(x, v) =
let rec loop x i =
match x with
| [] -> -1
| h::t when h = v -> i
| _::t -> loop t (i+1)
in loop x 0
Another thing is that using -1 as an exceptional case is not a good idea. You may forget this assumption somewhere and treat it as other indices. In OCaml, it's better to treat this exception using option type so the compiler forces you to take care of None every time:
let index(x, v) =
let rec loop x i =
match x with
| [] -> None
| h::t when h = v -> Some i
| _::t -> loop t (i+1)
in loop x 0
This is pretty clearly a homework problem, so I'll just make two comments.
First, values like i are immutable in OCaml. Their values don't change. So succ i doesn't do what your comment says. It doesn't change the value of i. It just returns a value that's one bigger than i. It's equivalent to i + 1, not to i++.
Second the essence of recursion is to imagine how you would solve the problem if you already had a function that solves the problem! The only trick is that you're only allowed to pass this other function a smaller version of the problem. In your case, a smaller version of the problem is one where the list is shorter.
You can't mutate variables in OCaml (well, there is a way but you really shouldn't for simple things like this)
A basic trick you can do is create a helper function that receives extra arguments corresponding to the variables you want to "mutate". Note how I added an extra parameter for the i and also "mutate" the current list head in a similar way.
let rec index_helper (x, vs, i) =
match vs with
[] -> -1
| (curr::rest) ->
if(curr == x) then
i
else
index_helper (x, rest, i+1)
;;
let index (x, vs) = index_helper (x, vs, 0) ;;
This kind of tail-recursive transformation is a way to translate loops to functional programming but to be honest it is kind of low level (you have full power but the manual recursion looks like programming with gotos...).
For some particular patterns what you can instead try to do is take advantage of reusable higher order functions, such as map or folds.