Related
I would like to create a function remove_duplicates that takes a list of any type (e.g. can be an int list or a bool list or a int list list or a whatever list) and returns the same list without duplicates, is this possible in Standard ML?
Is a function that takes a list of any type and returns the list without duplicates possible in Standard ML?
No.
To determine if one element is a duplicate of another, their values must be comparable. "Any type", or 'a in Standard ML, is not comparable for equality. So while you cannot have a val nub : 'a list -> 'a list that removes duplicates, here are four alternative options:
What #qouify suggests, the built-in equality type ''a, so anything you can use = on:
val nub : ''a list -> ''a list
What #kopecs suggests, a function that takes an equality operator as parameter:
val nub : ('a * 'a -> bool) -> 'a list -> 'a list
Which is a generalisation of 1., since here, nub op= : ''a list -> ''a list. This solution is kind of neat since it lets you remove not only duplicates, but also redundant representatives of arbitrary equivalence classes, e.g. nub (fn (x, y) => (x mod 3) = (y mod 3)) will only preserve integers that are distinct modulo 3. But its complexity is O(n²). (-_- )ノ⌒┻━┻
Because it is O(n²), nub is considered harmful.
As the article also suggests, the alternative is to use ordering rather than equality to reduce the complexity to O(n log n). While in Haskell this means only changing the type class constraint:
nub :: Eq a => [a] -> [a]
nubOrd :: Ord a => [a] -> [a]
and adjusting the algorithm, it gets a little more complicated to express this constraint in SML. While we do have ''a to represent Eq a => a (that we can use = on our input), we don't have a similar special syntax support for elements that can be compared as less/equal/greater, and we also don't have type classes. We do have the following built-in order type:
datatype order = LESS | EQUAL | GREATER
so if you like kopecs' solution, a variation with a better running time is:
val nubOrd : ('a * 'a -> order) -> 'a list -> 'a list
since it can use something like a mathematical set of previously seen elements, implemented using some kind of balanced search tree; n inserts each of complexity O(log n) takes a total of O(n log n) steps.
One of SML's winner features is its composable module system. Instead of using parametric polymorphism and feeding the function nubOrd with an order comparison function, you can create a module that takes another module as a parameter (a functor).
First, let's define a signature for modules that represent ordering of types:
signature ORD =
sig
type t
val compare : t * t -> order
end
(Notice that there isn't a ' in front of t.)
This means that anyone could make a struct ... end : ORD by specifying a t and a corresponding compare function for ts. Many built-in types have pre-defined compare functions: int has Int.compare and real has Real.compare.
Then, define a tree-based set data structure; I've used a binary search tree, and I've skipped most functions but the ones strictly necessary to perform this feat. Ideally you might extend the interface and use a better tree type, such as a self-balancing tree. (Unfortunately, since you've tagged this Q&A both as SML/NJ and Moscow ML, I wasn't sure which module to use, since they extend the standard library in different ways when it comes to balanced trees.)
functor TreeSet (X : ORD) =
struct
type t = X.t
datatype 'a tree = Leaf | Branch of 'a tree * 'a * 'a tree
val empty = Leaf
fun member (x, Leaf) = false
| member (x, Branch (left, y, right)) =
case X.compare (x, y) of
EQUAL => true
| LESS => member (x, left)
| GREATER => member (x, right)
fun insert (x, Leaf) = Branch (Leaf, x, Leaf)
| insert (x, Branch (left, y, right)) =
case X.compare (x, y) of
EQUAL => Branch (left, y, right)
| LESS => Branch (insert (x, left), y, right)
| GREATER => Branch (left, y, insert (x, right))
end
Lastly, the ListUtils functor contains the nubOrd utility function. The functor takes a structure X : ORD just like the TreeSet functor does. It creates an XSet structure by specialising the TreeSet functor using the same ordering module. It then uses this XSet to efficiently keep a record of the elements it has seen before.
functor ListUtils (X : ORD) =
struct
structure XSet = TreeSet(X)
fun nubOrd (xs : X.t list) =
let
val init = ([], XSet.empty)
fun go (x, (ys, seen)) =
if XSet.member (x, seen)
then (ys, seen)
else (x::ys, XSet.insert (x, seen))
in rev (#1 (foldl go init xs))
end
end
Using this functor to remove duplicates in an int list:
structure IntListUtils = ListUtils(struct
type t = int
val compare = Int.compare
end)
val example = IntListUtils.nubOrd [1,1,2,1,3,1,2,1,3,3,2,1,4,3,2,1,5,4,3,2,1]
(* [1, 2, 3, 4, 5] *)
The purpose of all that mess is a nubOrd without a direct extra function parameter.
Unfortunately, in order for this to extend to int list list, you need to create the compare function for that type, since unlike Int.compare, there isn't a generic one available in the standard library either. (This is where Haskell is a lot more ergonomic.)
So you might go and write a generic, lexicographical list compare function: If you know how to compare two elements of type 'a, you know how to compare two lists of those, no matter what the element type is:
fun listCompare _ ([], []) = EQUAL (* empty lists are equal *)
| listCompare _ ([], ys) = LESS (* empty is always smaller than non-empty *)
| listCompare _ (xs, []) = GREATER (* empty is always smaller than non-empty *)
| listCompare compare (x::xs, y::ys) =
case compare (x, y) of
EQUAL => listCompare compare (xs, ys)
| LESS => LESS
| GREATER => GREATER
And now,
structure IntListListUtils = ListUtils(struct
type t = int list
val compare = listCompare Int.compare
end)
val example2 = IntListListUtils.nubOrd [[1,2,3],[1,2,3,2],[1,2,3]]
(* [[1,2,3],[1,2,3,2]] *)
So even though [1,2,3] and [1,2,3,2] contain duplicates, they are not EQUAL when you compare them. But the third element is EQUAL to the first one, and so it gets removed as a duplicate.
Some last observations:
You may consider that even though each compare is only run O(log n) times, a single compare for some complex data structure, such as a (whatever * int) list list may still be expensive. So another improvement you can make here is to cache the result of every compare output, which is actually what Haskell's nubOrdOn operator does. ┳━┳ ヽ(ಠل͜ಠ)ノ
The functor approach is used extensively in Jane Street's OCaml Base library. The quick solution was to pass around an 'a * 'a -> order function around every single time you nub something. One moral, though, is that while the module system does add verbosity, if you provide enough of this machinery in a standard library, it will become quite convenient.
If you think the improvement from O(n²) to O(n log n) is not enough, consider Fritz Henglein's Generic top-down discrimination for sorting and partitioning in linear time (2012) and Edward Kmett's Haskell discrimination package's nub for a O(n) nub.
Yes. This is possible in SML through use of parametric polymorphism. You want a function of most general type 'a list -> 'a list where 'a is a type variable (i.e., variable that ranges over types) that would be read as alpha.
For some more concrete examples of how you might apply this (the explicit type variable after fun is optional):
fun 'a id (x : 'a) : 'a = x
Here we have the identity function with type 'a -> 'a.
We can declare similar functions with some degree of specialisation of the types, for instance
fun map _ [] = []
| map f (x::xs) = f x :: map f xs
Where map has most general type ('a -> 'b) -> 'a list -> 'b list, i.e, takes two curried arguments, one with some function type and another with some list type (agrees with function's domain) and returns a new list with type given by the codomain of the function.
For your specific problem you'll probably also want to take an equality function in order to determine what is a "duplicate" or you'll probably restrict yourself to "equality types" (types that can be compared with op=, represented by type variables with two leading apostrophes, e.g., ''a).
Yes sml provides polymorphism to do such things. In many cases you actually don't care for the type of the item in your lists (or other structures). For instance this function checks (already present in the List structure) for the existence of an item in a list:
fun exists _ [] = false
| exists x (y :: l) = x = y orelse exists x l
Such function works for any type of list as long as the equal operator is defined for this type (such type is called an equality type). You can do the same for remove_duplicates. In order to work with list of items of non equality types you will have to give remove_duplicates an additional function that checks if two items are equal.
I would like to make this functions recursive but I don't know where to start.
let rec rlist r n =
if n < 1 then []
else Random.int r :: rlist r (n-1);;
let rec divide = function
h1::h2::t -> let t1,t2 = divide t in
h1::t1, h2::t2
| l -> l,[];;
let rec merge ord (l1,l2) = match l1,l2 with
[],l | l,[] -> l
| h1::t1,h2::t2 -> if ord h1 h2
then h1::merge ord (t1,l2)
else h2::merge ord (l1,t2);;
Is there any way to test if a function is recursive or not?
If you give a man a fish, you feed him for a day. But if you give him a fishing rod, you feed him for a lifetime.
Thus, instead of giving you the solution, I would better teach you how to solve it yourself.
A tail-recursive function is a recursive function, where all recursive calls are in a tail position. A call position is called a tail position if it is the last call in a function, i.e., if the result of a called function will become a result of a caller.
Let's take the following simple function as our working example:
let rec sum n = if n = 0 then 0 else n + sum (n-1)
It is not a tail-recursive function as the call sum (n-1) is not in a tail position because its result is then incremented by one. It is not always easy to translate a general recursive function into a tail-recursive form. Sometimes, there is a tradeoff between efficiency, readability, and tail-recursion.
The general techniques are:
use accumulator
use continuation-passing style
Using accumulator
Sometimes a function really needs to store the intermediate results, because the result of recursion must be combined in a non-trivial way. A recursive function gives us a free container to store arbitrary data - the call stack. A place, where the language runtime, stores parameters for the currently called functions. Unfortunately, the stack container is bounded, and its size is unpredictable. So, sometimes, it is better to switch from the stack to the heap. The latter is slightly slower (because it introduces more work to the garbage collector), but is bigger and more controllable. In our case, we need only one word to store the running sum, so we have a clear win. We are using less space, and we're not introducing any memory garbage:
let sum n =
let rec loop n acc = if n = 0 then acc else loop (n-1) (acc+n) in
loop n 0
However, as you may see, this came with a tradeoff - the implementation became slightly bigger and less understandable.
We used here a general pattern. Since we need to introduce an accumulator, we need an extra parameter. Since we don't want or can't change the interface of our function, we introduce a new helper function, that is recursive and will carry the extra parameter. The trick here is that we apply the summation before we do the recursive call, not after.
Using continuation-passing style
It is not always the case when you can rewrite your recursive algorithm using an accumulator. In this case, a more general technique can be used - the continuation-passing style. Basically, it is close to the previous technique, but we will use a continuation in the place of an accumulator. A continuation is a function, that will actually postpone the work, that is needed to be done after the recursion, to a later time. Conventionally, we call this function return or simply k (for the continuation). Mentally, the continuation is a way of throwing the result of computation back into the future. "Back" is because you returning the result back to the caller, in the future, because, the result will be used not now, but once everything is ready. But let's look at the implementation:
let sum n =
let rec loop n k = if n = 0 then k 0 else loop (n-1) (fun x -> k (x+n)) in
loop n (fun x -> x)
You may see, that we employed the same strategy, except that instead of int accumulator we used a function k as a second parameter. If the base case, if n is zero, we will return 0, (you can read k 0 as return 0). In the general case, we recurse in a tail position, with a regular decrement of the inductive variable n, however, we pack the work, that should be done with the result of the recursive function into a function: fun x -> k (x+n). Basically, this function says, once x - the result of recursion call is ready, add it to the number n and return. (Again, if we will use name return instead of k it could be more readable: fun x -> return (x+n)).
There is no magic here, we still have the same tradeoff, as with accumulator, as we create a new closure (functional object) at every recursive call. And each newly created closure contains a reference to the previous one (that was passed via the parameter). For example, fun x -> k (x+n) is a function, that captures two free variables, the value n and function k, that was the previous continuation. Basically, these continuations form a linked list, where each node bears a computation and all arguments except one. So, the computation is delayed until the last one is known.
Of course, for our simple example, there is no need to use CPS, since it will create unnecessary garbage and be much slower. This is only for demonstration. However, for more complex algorithms, in particular for those that combine results of two or more recursive calls in a non-trivial case, e.g., folding over a graph data structure.
So now, armed with the new knowledge, I hope that you will be able to solve your problems as easy as pie.
Testing for the tail recursion
The tail call is a pretty well-defined syntactic notion, so it should be pretty obvious whether the call is in a tail position or not. However, there are still few methods that allow one to check whether the call is in a tail position. In fact, there are other cases, when tail-call optimization may come into play. For example, a call that is right to the shortcircuit logical operator is also a tail call. So, it is not always obvious when a call is using the stack or it is a tail call. The new version of OCaml allows one to put an annotation at the call place, e.g.,
let rec sum n = if n = 0 then 0 else n + (sum [#tailcall]) (n-1)
If the call is not really a tail call, a warning is issued by a compiler:
Warning 51: expected tailcall
Another method is to compile with -annot option. The annotation file will contain an annotation for each call, for example, if we will put the above function into a file sum.ml and compile with ocamlc -annot sum.ml, then we can open sum.annot file and look for all calls:
"sum.ml" 1 0 41 "sum.ml" 1 0 64
call(
stack
)
If we, however, put our third implementation, then the see that all calls are tail calls, e.g. grep call -A1 sum.annot:
call(
tail
--
call(
tail
--
call(
tail
--
call(
tail
Finally, you can just test your program with some big input, and see whether your program will fail with the stack overflow. You can even reduce the size of the stack, this can be controlled with the environment variable OCAMLRUNPARAM, for example, to limit the stack to one thousand words:
export OCAMLRUNPARAM='l=1000'
ocaml sum.ml
You could do the following :
let rlist r n =
let aux acc n =
if n < 1 then acc
else aux (Random.int r :: acc) (n-1)
in aux [] n;;
let divide l =
let aux acc1 acc2 = function
| h1::h2::t ->
aux (h1::acc1) (h2::acc2) t
| [e] -> e::acc1, acc2
| [] -> acc1, acc2
in aux [] [] l;;
But for divide I prefer this solution :
let divide l =
let aux acc1 acc2 = function
| [] -> acc1, acc2
| hd::tl -> aux acc2 (hd :: acc1) tl
in aux [] [] l;;
let merge ord (l1,l2) =
let rec aux acc l1 l2 =
match l1,l2 with
| [],l | l,[] -> List.rev_append acc l
| h1::t1,h2::t2 -> if ord h1 h2
then aux (h1 :: acc) t1 l2
else aux (h2 :: acc) l1 t2
in aux [] l1 l2;;
As to your question about testing if a function is tail recursive or not, by looking out for it a bit you would have find it here.
I am learning OCaml. I know that OCaml provides us with both imperative style of programming and functional programming.
I came across this code as part of my course to compute the n'th Fibonacci number in OCaml
let memoise f =
let table = ref []
in
let rec find tab n =
match tab with
| [] ->
let v = (f n)
in
table := (n, v) :: !table;
v
| (n', v) :: t ->
if n' = n then v else (find t n)
in
fun n -> find !table n
let fibonacci2 = memoise fibonacci1
Where the function fibonacci1 is implemented in the standard way as follows:
let rec fibonacci1 n =
match n with
| 0 | 1 -> 1
| _ -> (fibonacci1 (n - 1)) + (fibonacci1 (n - 2))
Now my question is that how are we achieving memoisation in fibonacci2. table has been defined inside the function fibonacci2 and thus, my logic dictates that after the function finishes computation, the list table should get lost and after each call the table will get built again and again.
I ran some a simple test where I called the function fibonacci 35 twice in the OCaml REPL and the second function call returned the answer significantly faster than the first call to the function (contrary to my expectations).
I though that this might be possible if declaring a variable using ref gives it a global scope by default.
So I tried this
let f y = let x = ref 5 in y;;
print_int !x;;
But this gave me an error saying that the value of x is unbounded.
Why does this behave this way?
The function memoise returns a value, call it f. (f happens to be a function). Part of that value is the table. Every time you call memoise you're going to get a different value (with a different table).
In the example, the returned value f is given the name fibonacci2. So, the thing named fibonacci2 has a table inside it that can be used by the function f.
There is no global scope by default, that would be a huge mess. At any rate, this is a question of lifetime not of scope. Lifetimes in OCaml last as long as an object can be reached somehow. In the case of the table, it can be reached through the returned function, and hence it lasts as long as the function does.
In your second example you are testing the scope (not the lifetime) of x, and indeed the scope of x is restricted to the subexpresssion of its let. (I.e., it is meaningful only in the expression y, where it's not used.) In the original code, all the uses of table are within its let, hence there's no problem.
Although references are a little tricky, the underlying semantics of OCaml come from lambda calculus, and are extremely clean. That's why it's such a delight to code in OCaml (IMHO).
I was required to write a set of functions for problems in class. I think the way I wrote them was a bit more complicated than they needed to be. I had to implement all the functions myself, without using and pre-defined ones. I'd like to know if there are any quick any easy "one line" versions of these answers?
Sets can be represented as lists. The members of a set may appear in any order on the list, but there shouldn't be more than one
occurrence of an element on the list.
(a) Define dif(A, B) to
compute the set difference of A and B, A-B.
(b) Define cartesian(A,
B) to compute the Cartesian product of set A and set B, { (a, b) |
a∈A, b∈B }.
(c) Consider the mathematical-induction proof of the
following: If a set A has n elements, then the powerset of A has 2n
elements. Following the proof, define powerset(A) to compute the
powerset of set A, { B | B ⊆ A }.
(d) Define a function which, given
a set A and a natural number k, returns the set of all the subsets of
A of size k.
(* Takes in an element and a list and compares to see if element is in list*)
fun helperMem(x,[]) = false
| helperMem(x,n::y) =
if x=n then true
else helperMem(x,y);
(* Takes in two lists and gives back a single list containing unique elements of each*)
fun helperUnion([],y) = y
| helperUnion(a::x,y) =
if helperMem(a,y) then helperUnion(x,y)
else a::helperUnion(x,y);
(* Takes in an element and a list. Attaches new element to list or list of lists*)
fun helperAttach(a,[]) = []
helperAttach(a,b::y) = helperUnion([a],b)::helperAttach(a,y);
(* Problem 1-a *)
fun myDifference([],y) = []
| myDifference(a::x,y) =
if helper(a,y) then myDifference(x,y)
else a::myDifference(x,y);
(* Problem 1-b *)
fun myCartesian(xs, ys) =
let fun first(x,[]) = []
| first(x, y::ys) = (x,y)::first(x,ys)
fun second([], ys) = []
| second(x::xs, ys) = first(x, ys) # second(xs,ys)
in second(xs,ys)
end;
(* Problem 1-c *)
fun power([]) = [[]]
| power(a::y) = union(power(y),insert(a,power(y)));
I never got to problem 1-d, as these took me a while to get. Any suggestions on cutting these shorter? There was another problem that I didn't get, but I'd like to know how to solve it for future tests.
(staircase problem) You want to go up a staircase of n (>0) steps. At one time, you can go by one step, two steps, or three steps. But,
for example, if there is one step left to go, you can go only by one
step, not by two or three steps. How many different ways are there to
go up the staircase? Solve this problem with sml. (a) Solve it
recursively. (b) Solve it iteratively.
Any help on how to solve this?
Your set functions seem nice. I would not change anything principal about them except perhaps their formatting and naming:
fun member (x, []) = false
| member (x, y::ys) = x = y orelse member (x, ys)
fun dif ([], B) = []
| dif (a::A, B) = if member (a, B) then dif (A, B) else a::dif(A, B)
fun union ([], B) = B
| union (a::A, B) = if member (a, B) then union (A, B) else a::union(A, B)
(* Your cartesian looks nice as it is. Here is how you could do it using map: *)
local val concat = List.concat
val map = List.map
in fun cartesian (A, B) = concat (map (fn a => map (fn b => (a,b)) B) A) end
Your power is also very neat. If you call your function insert, it deserves a comment about inserting something into many lists. Perhaps insertEach or similar is a better name.
On your last task, since this is a counting problem, you don't need to generate the actual combinations of steps (e.g. as lists of steps), only count them. Using the recursive approach, try and write the base cases down as they are in the problem description.
I.e., make a function steps : int -> int where the number of ways to take 0, 1 and 2 steps are pre-calculated, but for n steps, n > 2, you know that there is a set of combinations of steps that begin with either 1, 2 or 3 steps plus the number combinations of taking n-1, n-2 and n-3 steps respectively.
Using the iterative approach, start from the bottom and use parameterised counting variables. (Sorry for the vague hint here.)
I am trying to write a function that returns the index of the passed value v in a given list x; -1 if not found. My attempt at the solution:
let rec index (x, v) =
let i = 0 in
match x with
[] -> -1
| (curr::rest) -> if(curr == v) then
i
else
succ i; (* i++ *)
index(rest, v)
;;
This is obviously wrong to me (it will return -1 every time) because it redefines i at each pass. I have some obscure ways of doing it with separate functions in my head, none which I can write down at the moment. I know this is a common pattern in all programming, so my question is, what's the best way to do this in OCaml?
Mutation is not a common way to solve problems in OCaml. For this task, you should use recursion and accumulate results by changing the index i on certain conditions:
let index(x, v) =
let rec loop x i =
match x with
| [] -> -1
| h::t when h = v -> i
| _::t -> loop t (i+1)
in loop x 0
Another thing is that using -1 as an exceptional case is not a good idea. You may forget this assumption somewhere and treat it as other indices. In OCaml, it's better to treat this exception using option type so the compiler forces you to take care of None every time:
let index(x, v) =
let rec loop x i =
match x with
| [] -> None
| h::t when h = v -> Some i
| _::t -> loop t (i+1)
in loop x 0
This is pretty clearly a homework problem, so I'll just make two comments.
First, values like i are immutable in OCaml. Their values don't change. So succ i doesn't do what your comment says. It doesn't change the value of i. It just returns a value that's one bigger than i. It's equivalent to i + 1, not to i++.
Second the essence of recursion is to imagine how you would solve the problem if you already had a function that solves the problem! The only trick is that you're only allowed to pass this other function a smaller version of the problem. In your case, a smaller version of the problem is one where the list is shorter.
You can't mutate variables in OCaml (well, there is a way but you really shouldn't for simple things like this)
A basic trick you can do is create a helper function that receives extra arguments corresponding to the variables you want to "mutate". Note how I added an extra parameter for the i and also "mutate" the current list head in a similar way.
let rec index_helper (x, vs, i) =
match vs with
[] -> -1
| (curr::rest) ->
if(curr == x) then
i
else
index_helper (x, rest, i+1)
;;
let index (x, vs) = index_helper (x, vs, 0) ;;
This kind of tail-recursive transformation is a way to translate loops to functional programming but to be honest it is kind of low level (you have full power but the manual recursion looks like programming with gotos...).
For some particular patterns what you can instead try to do is take advantage of reusable higher order functions, such as map or folds.