I would like to create a function remove_duplicates that takes a list of any type (e.g. can be an int list or a bool list or a int list list or a whatever list) and returns the same list without duplicates, is this possible in Standard ML?
Is a function that takes a list of any type and returns the list without duplicates possible in Standard ML?
No.
To determine if one element is a duplicate of another, their values must be comparable. "Any type", or 'a in Standard ML, is not comparable for equality. So while you cannot have a val nub : 'a list -> 'a list that removes duplicates, here are four alternative options:
What #qouify suggests, the built-in equality type ''a, so anything you can use = on:
val nub : ''a list -> ''a list
What #kopecs suggests, a function that takes an equality operator as parameter:
val nub : ('a * 'a -> bool) -> 'a list -> 'a list
Which is a generalisation of 1., since here, nub op= : ''a list -> ''a list. This solution is kind of neat since it lets you remove not only duplicates, but also redundant representatives of arbitrary equivalence classes, e.g. nub (fn (x, y) => (x mod 3) = (y mod 3)) will only preserve integers that are distinct modulo 3. But its complexity is O(n²). (-_- )ノ⌒┻━┻
Because it is O(n²), nub is considered harmful.
As the article also suggests, the alternative is to use ordering rather than equality to reduce the complexity to O(n log n). While in Haskell this means only changing the type class constraint:
nub :: Eq a => [a] -> [a]
nubOrd :: Ord a => [a] -> [a]
and adjusting the algorithm, it gets a little more complicated to express this constraint in SML. While we do have ''a to represent Eq a => a (that we can use = on our input), we don't have a similar special syntax support for elements that can be compared as less/equal/greater, and we also don't have type classes. We do have the following built-in order type:
datatype order = LESS | EQUAL | GREATER
so if you like kopecs' solution, a variation with a better running time is:
val nubOrd : ('a * 'a -> order) -> 'a list -> 'a list
since it can use something like a mathematical set of previously seen elements, implemented using some kind of balanced search tree; n inserts each of complexity O(log n) takes a total of O(n log n) steps.
One of SML's winner features is its composable module system. Instead of using parametric polymorphism and feeding the function nubOrd with an order comparison function, you can create a module that takes another module as a parameter (a functor).
First, let's define a signature for modules that represent ordering of types:
signature ORD =
sig
type t
val compare : t * t -> order
end
(Notice that there isn't a ' in front of t.)
This means that anyone could make a struct ... end : ORD by specifying a t and a corresponding compare function for ts. Many built-in types have pre-defined compare functions: int has Int.compare and real has Real.compare.
Then, define a tree-based set data structure; I've used a binary search tree, and I've skipped most functions but the ones strictly necessary to perform this feat. Ideally you might extend the interface and use a better tree type, such as a self-balancing tree. (Unfortunately, since you've tagged this Q&A both as SML/NJ and Moscow ML, I wasn't sure which module to use, since they extend the standard library in different ways when it comes to balanced trees.)
functor TreeSet (X : ORD) =
struct
type t = X.t
datatype 'a tree = Leaf | Branch of 'a tree * 'a * 'a tree
val empty = Leaf
fun member (x, Leaf) = false
| member (x, Branch (left, y, right)) =
case X.compare (x, y) of
EQUAL => true
| LESS => member (x, left)
| GREATER => member (x, right)
fun insert (x, Leaf) = Branch (Leaf, x, Leaf)
| insert (x, Branch (left, y, right)) =
case X.compare (x, y) of
EQUAL => Branch (left, y, right)
| LESS => Branch (insert (x, left), y, right)
| GREATER => Branch (left, y, insert (x, right))
end
Lastly, the ListUtils functor contains the nubOrd utility function. The functor takes a structure X : ORD just like the TreeSet functor does. It creates an XSet structure by specialising the TreeSet functor using the same ordering module. It then uses this XSet to efficiently keep a record of the elements it has seen before.
functor ListUtils (X : ORD) =
struct
structure XSet = TreeSet(X)
fun nubOrd (xs : X.t list) =
let
val init = ([], XSet.empty)
fun go (x, (ys, seen)) =
if XSet.member (x, seen)
then (ys, seen)
else (x::ys, XSet.insert (x, seen))
in rev (#1 (foldl go init xs))
end
end
Using this functor to remove duplicates in an int list:
structure IntListUtils = ListUtils(struct
type t = int
val compare = Int.compare
end)
val example = IntListUtils.nubOrd [1,1,2,1,3,1,2,1,3,3,2,1,4,3,2,1,5,4,3,2,1]
(* [1, 2, 3, 4, 5] *)
The purpose of all that mess is a nubOrd without a direct extra function parameter.
Unfortunately, in order for this to extend to int list list, you need to create the compare function for that type, since unlike Int.compare, there isn't a generic one available in the standard library either. (This is where Haskell is a lot more ergonomic.)
So you might go and write a generic, lexicographical list compare function: If you know how to compare two elements of type 'a, you know how to compare two lists of those, no matter what the element type is:
fun listCompare _ ([], []) = EQUAL (* empty lists are equal *)
| listCompare _ ([], ys) = LESS (* empty is always smaller than non-empty *)
| listCompare _ (xs, []) = GREATER (* empty is always smaller than non-empty *)
| listCompare compare (x::xs, y::ys) =
case compare (x, y) of
EQUAL => listCompare compare (xs, ys)
| LESS => LESS
| GREATER => GREATER
And now,
structure IntListListUtils = ListUtils(struct
type t = int list
val compare = listCompare Int.compare
end)
val example2 = IntListListUtils.nubOrd [[1,2,3],[1,2,3,2],[1,2,3]]
(* [[1,2,3],[1,2,3,2]] *)
So even though [1,2,3] and [1,2,3,2] contain duplicates, they are not EQUAL when you compare them. But the third element is EQUAL to the first one, and so it gets removed as a duplicate.
Some last observations:
You may consider that even though each compare is only run O(log n) times, a single compare for some complex data structure, such as a (whatever * int) list list may still be expensive. So another improvement you can make here is to cache the result of every compare output, which is actually what Haskell's nubOrdOn operator does. ┳━┳ ヽ(ಠل͜ಠ)ノ
The functor approach is used extensively in Jane Street's OCaml Base library. The quick solution was to pass around an 'a * 'a -> order function around every single time you nub something. One moral, though, is that while the module system does add verbosity, if you provide enough of this machinery in a standard library, it will become quite convenient.
If you think the improvement from O(n²) to O(n log n) is not enough, consider Fritz Henglein's Generic top-down discrimination for sorting and partitioning in linear time (2012) and Edward Kmett's Haskell discrimination package's nub for a O(n) nub.
Yes. This is possible in SML through use of parametric polymorphism. You want a function of most general type 'a list -> 'a list where 'a is a type variable (i.e., variable that ranges over types) that would be read as alpha.
For some more concrete examples of how you might apply this (the explicit type variable after fun is optional):
fun 'a id (x : 'a) : 'a = x
Here we have the identity function with type 'a -> 'a.
We can declare similar functions with some degree of specialisation of the types, for instance
fun map _ [] = []
| map f (x::xs) = f x :: map f xs
Where map has most general type ('a -> 'b) -> 'a list -> 'b list, i.e, takes two curried arguments, one with some function type and another with some list type (agrees with function's domain) and returns a new list with type given by the codomain of the function.
For your specific problem you'll probably also want to take an equality function in order to determine what is a "duplicate" or you'll probably restrict yourself to "equality types" (types that can be compared with op=, represented by type variables with two leading apostrophes, e.g., ''a).
Yes sml provides polymorphism to do such things. In many cases you actually don't care for the type of the item in your lists (or other structures). For instance this function checks (already present in the List structure) for the existence of an item in a list:
fun exists _ [] = false
| exists x (y :: l) = x = y orelse exists x l
Such function works for any type of list as long as the equal operator is defined for this type (such type is called an equality type). You can do the same for remove_duplicates. In order to work with list of items of non equality types you will have to give remove_duplicates an additional function that checks if two items are equal.
I'm brand new to SML/NJ and I'm trying to make a recursive function
that makes a listOfLists. Ex: listOf([1,2,3,4]) will output
[[1],[2],[3],[4]]. I've found a recursive merge in SML/NJ, and I'm
trying to use it as kind've an outline:
- fun merge(xs,nil) = xs
= | merge(nil,ys) = ys
= | merge(x::xs, y::ys) =
= if (x < y) then x::merge(xs, y::ys) else y::merge(x::xs,ys);
- fun listOf(xs) = xs
= | listOf(x::xs) = [x]::listOf(xs);
I'm trying to use pattern match and I'm a little confused on it. I'm
pretty sure x is the head and then xs is the tail, but I could be
wrong. So what I'm trying to do is use the head of the list, make it a
list, and then add it to the rest of the list. But when trying to do
this function, I get the error:
stdIn:15.19-15.34 Error: operator and operand don't agree [circularity]
operator domain: 'Z list * 'Z list list
operand: 'Z list * 'Z list
in expression:
(x :: nil) :: listOf xs
This error is foreign to me because I don't have really any experience
with sml/nj. How can I fix my listOf function?
You are fairly close. The problem is that in pattern-matching, a pattern like xs (just a variable) can match anything. The fact that you end it with s doesn't mean that the pattern can only match a tail of a list. Using s in that way is just a programmer convention in SML.
Thus, in your definition:
fun listOf(xs) = xs
| listOf(x::xs) = [x]::listOf(xs);
The first line tells SML to return all values unchanged, which is clearly not your intent. SML detects that this is inconsistent with the second line where you are trying to change a value after all.
You need to change that first line so that it doesn't match everything. Looking at that merge function as a template, you need something which matches a basis case. The natural basis case is nil (which can also be written as []). Note the role that nil plays in the definition of merge. If you use nil instead of xs for the pattern in the first line of your function definition, your second line does exactly what you want and the function will work as intended:
fun listOf(nil) = nil
| listOf(x::xs) = [x]::listOf(xs);
I am trying to append one list to an end of another list. The usual :: operator gives me the following error:
This expression has type char list
but an expression was expected of type char
in the statement
`(createList 0 5 'a')::['c';'d';'e';'f';'g']`
(* the following createList expression returns the list ['a';'a';'a';'a';'a'] *)
When I use # operator, it appends the list fine; so, my question is what is the difference between # and ::? is it just # is used between two lists while :: used between list and non-list types?
# concatenates two lists (type 'a list -> 'a list -> 'a list), while :: takes an element of a certain type and "prepends" it before a list containing elements of exactly the same type (i.e. :: has type 'a -> 'a list -> 'a list).
You can basically simulate a::b by [a]#b.
Note that # requires OCaml to traverse the first list given to find the last argument of the first list. This takes O(n) time where n is the number of elements in the first list. ::, on the other hand, requires O(1) time.
Regarding your example (createList 0 5 'a')::['c';'d';'e';'f';'g']:
(createList 0 5 'a') creates a list holding 'a's, i.e. we have type char list, and ['c';'d';'e';'f';'g'] is also of type char list. Thus, you can only use # to concatenate them (see above), and :: makes no sense (see type signature of :: above).
# is to concatenates two lists.
:: is to add an element to the head of a list
We want to find the largest value in a given nonempty list of integers. Then we have to compare elements in the list. Since data
values are given as a sequence, we can do comparisons from the
beginning or from the end of the list. Define in both ways. a)
comparison from the beginning b) comparison from the end (How can we
do this when data values are in a list?) No auxiliary functions.
I've been playing around a lot with recursive functions, but can't seem to figure out how to compare two values in the list.
fun listCompare [] = 0
| listCompare [x] = x
| listCompare (x::xs) = listCompare(xs)
This will break the list down to the last element, but how do I start comparing and composing the list back up?
You could compare the first two elements of a given list and keep the larger element in the list and drop the other. Once the list has only one element, then you have the maximum. In functional pseudocode for a) it looks roughly like so:
lmax [] = error "empty list"
lmax [x] = x
lmax (x::y::xs) =
if x > y then lmax (x::xs)
else lmax (y::xs)
For b) you could reverse the list first.
This is what the foldl (or foldr) function in the SML list library is for :
foldl : ((`a * `b) -> `b) -> `b -> `a list -> `b
You can simply add an anonymous function to compare the current element against the accumulator :
fun lMax l =
foldl (fn (x,y) => if x > y then x else y) (nth l 0) l
The nth function simply takes the int list : l and an int : 0 to return the first element in the list. As lists in SML are written recursively as : h :: t, retrieving the first element is an O(1) operation, and using the foldl function greatly increases the elegance of code. The whole point of having a functional language is to define abstractions to pass around anonymous functions as higher-order functions and re-use the abstract type definitions with concrete functions.
I got an implementation for append function in OCaml, but it seems confused to me
let rec append = function
| [] -> fun y -> y
| h :: t -> fun y -> h :: (append t y)
What is the purpose of the fun y in this case ?
The type of append is 'a list -> 'a list -> 'a list. You can look at this as a function that takes two lists and returns a list. But (as is idiomatic in OCaml) the function is defined using currying. So at the basic level, append takes a first list and returns a function of type 'a list -> 'a list. The returned function takes the second list and prefixes the first list to it (returning the result).
The value fun y -> y is the function that append returns when the first list is empty. If you think about it, this makes sense. If the first list is empty, the second list will be returned unchanged. In other words, the returned function is no different at all from an identity function (specialized for applying to lists).
The second case returns the value fun y -> h :: (append t y). This is similar, but a little more complicated. The returned function needs to do some actual appending. It does this by (recursively) appending the supplied second list (y) to the tail of the first list (t), then adding the head of the first list (h) to the front of that.
If you don't like the fun, you can rewrite the function like this
let rec append x y = match x with
| [] -> y
| h :: t -> h :: append t y