As follow up to yesterday's question Erlang: choosing unique items from a list, using recursion
In Erlang, say I wanted choose all unique items from a given list, e.g.
List = [foo, bar, buzz, foo].
and I had used your code examples resulting in
NewList = [bar, buzz].
How would I further manipulate NewList in Erlang?
For example, say I not only wanted to choose all unique items from List, but also count the total number of characters of all resulting items from NewList?
In functional programming we have patterns that occur so frequently they deserve their own names and support functions. Two of the most widely used ones are map and fold (sometimes reduce). These two form basic building blocks for list manipulation, often obviating the need to write dedicated recursive functions.
Map
The map function iterates over a list in order, generating a new list where each element is the result of applying a function to the corresponding element in the original list. Here's how a typical map might be implemented:
map(Fun, [H|T]) -> % recursive case
[Fun(H)|map(Fun, T)];
map(_Fun, []) -> % base case
[].
This is a perfect introductory example to recursive functions; roughly speaking, the function clauses are either recursive cases (result in a call to iself with a smaller problem instance) or base cases (no recursive calls made).
So how do you use map? Notice that the first argument, Fun, is supposed to be a function. In Erlang, it's possible to declare anonymous functions (sometimes called lambdas) inline. For example, to square each number in a list, generating a list of squares:
map(fun(X) -> X*X end, [1,2,3]). % => [1,4,9]
This is an example of Higher-order programming.
Note that map is part of the Erlang standard library as lists:map/2.
Fold
Whereas map creates a 1:1 element mapping between one list and another, the purpose of fold is to apply some function to each element of a list while accumulating a single result, such as a sum. The right fold (it helps to think of it as "going to the right") might look like so:
foldr(Fun, Acc, [H|T]) -> % recursive case
foldr(Fun, Fun(H, Acc), T);
foldr(_Fun, Acc, []) -> % base case
Acc.
Using this function, we can sum the elements of a list:
foldr(fun(X, Sum) -> Sum + X, 0, [1,2,3,4,5]). %% => 15
Note that foldr and foldl are both part of the Erlang standard library, in the lists module.
While it may not be immediately obvious, a very large class of common list-manipulation problems can be solved using map and fold alone.
Thinking recursively
Writing recursive algorithms might seem daunting at first, but as you get used to it, it turns out to be quite natural. When encountering a problem, you should identify two things:
How can I decompose the problem into smaller instances? In order for recursion to be useful, the recursive call must take a smaller problem as its argument, or the function will never terminate.
What's the base case, i.e. the termination criterion?
As for 1), consider the problem of counting the elements of a list. How could this possibly be decomposed into smaller subproblems? Well, think of it this way: Given a non-empty list whose first element (head) is X and whose remainder (tail) is Y, its length is 1 + the length of Y. Since Y is smaller than the list [X|Y], we've successfully reduced the problem.
Continuing the list example, when do we stop? Well, eventually, the tail will be empty. We fall back to the base case, which is the definition that the length of the empty list is zero. You'll find that writing function clauses for the various cases is very much like writing definitions for a dictionary:
%% Definition:
%% The length of a list whose head is H and whose tail is T is
%% 1 + the length of T.
length([H|T]) ->
1 + length(T);
%% Definition: The length of the empty list ([]) is zero.
length([]) ->
0.
You could use a fold to recurse over the resulting list. For simplicity I turned your atoms into strings (you could do this with list_to_atom/1):
1> NewList = ["bar", "buzz"].
["bar","buzz"]
2> L = lists:foldl(fun (W, Acc) -> [{W, length(W)}|Acc] end, [], NewList).
[{"buzz",4},{"bar",3}]
This returns a proplist you can access like so:
3> proplists:get_value("buzz", L).
4
If you want to build the recursion yourself for didactic purposes instead of using lists:
count_char_in_list([], Count) ->
Count;
count_char_in_list([Head | Tail], Count) ->
count_char_in_list(Tail, Count + length(Head)). % a string is just a list of numbers
And then:
1> test:count_char_in_list(["bar", "buzz"], 0).
7
Related
When accumulating a collection (just collection, not list) of values into a single value, there are two options.
reduce(). Which takes a List<T>, and a function (T, T) -> T, and applies that function iteratively until the whole list is reduced into a single value.
fold(). Which takes a List<T>, an initial value V, and a function (V, T) -> V, and applies that function iteratively until the whole list is folded into a single value.
I know that both of them have their own use cases. For eg, reduce() can be used to find maximum value in a list and fold() can be used to find sum of all values in a list.
But, in that example, instead of using fold(), you can add(0), and then reduce(). Another use case of fold is to join all elements into a string. But this can also be done without using fold, by map |> toString() followed by reduce().
Just out of curiosity, the question is, can every use case of fold() be avoided given functions map(), filter(), reduce() and add()? (also remove() if required.)
It's the other way around. reduce(L,f) = fold(first(L), rest(L), f), so there's no special need for reduce -- it's just a short form for a common fold pattern.
fold has lots of use cases of its own, though.
The example you gave for string concatenation is one of them -- you can fold items into a special string accumulator much more efficiently than you can build strings by incremental accumulation. (exactly how depends on the language, but it's true pretty much everywhere).
Applying a list of incremental changes to a target object is a pretty common pattern. Adding files to a folder, drawing shapes on a canvas, turning a list into a set, crossing off completed items in a to-do list, etc., are all examples of this pattern.
Also map(L,f) = fold(newMap(), L, (M,v) -> add(M,f(v)), so map is also just a common fold pattern. Similarly, filter(L,f) = fold(newList(), L, (L,v) -> f(v) ? add(L,v) : L).
The following code snippet comes from the official OCaml website:
# let rec compress = function
| a :: (b :: _ as t) -> if a = b then compress t else a :: compress t
| smaller -> smaller;;
val compress : 'a list -> 'a list = <fun>
The above function 'compresses' a list with consecutive, duplicative elements, e.g. :
# compress ["a";"a";"a";"a";"b";"c";"c";"a";"a";"d";"e";"e";"e";"e"];;
- : string list = ["a"; "b"; "c"; "a"; "d"; "e"]
I'm having a devil of a time understanding the logic of the above code. I'm used to coding imperatively, so this recursive, functional approach, combined with OCamls laconic - but obscure - syntax is causing me to struggle.
For example, where is the base case? Is it smaller -> smaller? I know smaller is a variable, or an identifier, but what is it returning (is returning even the right term in OCaml for what's happening here)?
I know that lists in OCaml are singly linked, so I'm also wondering if a new list is being generated, or if elements of the existed list are being cut? Since OCaml is functional, I'm inclined to think that lists are not mutable - is that correct? If you want to change a list, you essentially need to generate a new list with the elements you're seeking to add (or with the elements you're seeking to excise absent). Is this a correct understanding?
Yes, the base case is this:
| smaller -> smaller
The first pattern of the match expression matches any list of length 2 or greater. (It would be good to make sure you see why this is the case.)
Since OCaml matches patterns in order, the base case matches lists of lengths 0 and 1. That's why the programmer chose the name smaller. They were thinking "this is some smaller list".
The parts of a match statement look like this in general:
| pattern -> result
Any names in the pattern are bound to parts of the value matched against the pattern (as you say). So smaller is bound to the whole list. So in sum, the second part of the match says that if the list is of length 0 or 1, the result should just be the list itself.
Lists in OCaml are immutable, so it's not possible for the result of the function to be a modified version of the list. The result is a new list, unless the list is already a short list (of length 0 or 1).
So, what you say about the immutability of OCaml lists is exactly correct.
This was a question on a sample exam I did.
Give the definition of a Prolog predicate split_into_pairs that takes as arguments a list and returns as a result a list which consists of paired elements. For example, split_into_pairs([1,2,3,4,5,6],X) would return as a result X=[[1,2],[3,4],[5,6]]. Similarly, split_into_pairs([a,2,3,4,a,a,a,a],X) would return as result X=[[a,2],[3,4],[a,a],[a,a]] while split_into_pairs([1,2,3],X) would return No.
It's not meant to be done using built-in predicates I believe, but it shouldn't need to be too complicated either as it was only worth 8/120 marks.
I'm not sure what it should do for a list of two elements, so I guess that would either be not specified so that it returns no, or split_into_pairs([A,B],[[A,B]]).
My main issue is how to do the recursive call properly, without having extra brackets, not ending up as something like X=[[A,B],[[C,D],[[E,F]]]]?.
My most recent attempts have been variations of the code below, but obviously this is incorrect.
split_into_pairs([A,B],[A,B])
split_into_pairs([A,B|T], X) :- split_into_pairs(T, XX), X is [A,B|XX]
This is a relatively straightforward recursion:
split_into_pairs([], []).
split_into_pairs([First, Second | Tail], [[First, Second] | Rest]) :-
split_into_pairs(Tail, Rest).
The first rule says that an empty list is already split into pairs; the second requires that the source list has at least two items, pairs them up, and inserts the result of pairing up the tail list behind them.
Here is a demo on ideone.
Your solution could be fixed as well by adding square brackets in the result, and moving the second part of the rule into the header, like this:
split_into_pairs([A,B],[[A,B]]).
split_into_pairs([A,B|T], [[A,B]|XX]) :- split_into_pairs(T, XX).
Note that this solution does not consider an empty list a list of pairs, so split_into_pairs([], X) would fail.
Your code is almost correct. It has obvious syntax issues, and several substantive issues:
split_into_pairs([A,B], [ [ A,B ] ] ):- !.
split_into_pairs([A,B|T], X) :- split_into_pairs(T, XX),
X = [ [ A,B ] | XX ] .
Now it is correct: = is used instead of is (which is normally used with arithmetic operations), both clauses are properly terminated by dots, and the first one has a cut added into it, to make the predicate deterministic, to produce only one result. The correct structure is produced by enclosing each pair of elements into a list of their own, with brackets.
This is inefficient though, because it describes a recursive process - it constructs the result on the way back from the base case.
The efficient definition works on the way forward from the starting case:
split_into_pairs([A,B],[[A,B]]):- !.
split_into_pairs([A,B|T], X) :- X = [[A,B]|XX], split_into_pairs(T, XX).
This is the essence of tail recursion modulo cons optimization technique, which turns recursive processes into iterative ones - such that are able to run in constant stack space. It is very similar to the tail-recursion with accumulator technique.
The cut had to be introduced because the two clauses are not mutually exclusive: a term unifying with [A,B] could also be unifiable with [A,B|T], in case T=[]. We can get rid of the cut by making the two clauses to be mutually-exclusive:
split_into_pairs([], [] ).
split_into_pairs([A,B|T], [[A,B]|XX]):- split_into_pairs(T, XX).
is there an easy way to go through a list?
lets say i wanted to access the 5th data on the list not knowing it was a B
["A","A","A","A","B","A","A","A","A"]
is there a way i can do it without having to sort through the list?
I do not know Miranda that well, but I expect the functions skip and take are available.
you can address the 5th element by making a function out of skip and take. When skip and take are not available, it is easy to create them yourself.
skip: skips the y number of elements in a list, when y is greater than the number of items in the list, it will return an empty list
take: takes the first y number of elements in a list, when y is greater than the number of items in the list, the full list will be returned.
skip y [] = []
skip 0 xs = xs
skip y (x:xs) = skip xs (y-1)
take y [] = []
take 0 xs = []
take y (x:xs) = x : take (y-1) xs
elementAt x xs = take 1 (skip x xs)
Lists are inductive datatypes. This means that functions defined over lists - for instance, accessing the nth element - are defined by recursion. The data structure you are looking for appears to be an array, which allows constant time lookup. The easiest way to find the element at an index in a list is directly:
lookup :: Int -> [a] -> Maybe [a]
lookup n [] = Nothing
lookup 0 (x:xs) = Just x
lookup n (x:xs) = lookup (n - 1) xs
Another way to do this would be to use the ! operator. Let's say you have a program with defined data in the list, such as:
plist = [A,A,A,A,B,A,A,A,A]
then executing plist!4 will give you the 5th element of that list. (4 being the 5th unit if you include 0,1,2,3,4)
So plist!4 returns B.
Lists are not arrays.
You can only access elements beginning from first. Think of lists as streams (like a song playing in radio). Lists may be of infinite length (as radio never stops).
Most programmers uses "syntactic" sugar, which hides the nature of lists behind an easier syntax.
Miranda automatically loads a default library named stdenv.m, which you can study.
Now, let's think about your problem:
You want to ignore ("drop") all elements before the 5th and then get the first element from the rest of the ramaining list.
This is expressed in Miranda as:
nth :: num -> [*] -> *
nth n = hd . drop (n-1)
This is a function with explicit type declaration to see, that function works with every list (elements are of wildcard type *).
Sample:
plist :: [[char]]
plist = ["A","A","A","A","B","A","A","A","A"]
result :: [char]
result = nth 5 plist
If you want to code your functions with error handling, you need techniques to catch that there is no 5th element in your list.
As seen above, one technique is "Maybe". Another is continuations.
A bad technique is to check the length of list first, because this will crash with infinite lists.
From page 90 of Erlang Programming by Cesarini and Thomson, there is an example that has no detailed discussion. I'm quite the newbie to functional programming and recursive thinking, so I'm not familiar in solving problems this way.
"For example, the following function merges two lists (of the same length) by interleaving
their values: "
merge(Xs,Ys) -> lists:reverse(mergeL(Xs,Ys,[])).
mergeL([X|Xs],Ys,Zs) -> mergeR(Xs,Ys,[X|Zs]);
mergeL([],[],Zs) -> Zs.
mergeR(Xs,[Y|Ys],Zs) -> mergeL(Xs,Ys,[Y|Zs]);
mergeR([],[],Zs) -> Zs.
How does this work? Thanks!
step through it
merge([1,2],[3,4])
reverse(mergeL([1,2],[3,4],[]))
reverse(mergeR([2],[3,4],[1]))
reverse(mergeL([2],[4],[3,1]))
reverse(mergeR([], [4], [2,3,1]))
reverse(mergeL([], [], [4,2,3,1]))
reverse([4,2,3,1])
[1,3,2,4]
It's always good to work these functions by hand on a piece of paper with a small input where you're trying to figure it. You'll quickly see how it works.
This function is called first:
merge(Xs,Ys) -> lists:reverse(mergeL(Xs,Ys,[])).
The empty list [] passed to mergeL is the accumulator - this is where the answer will come from. Note that the first function calls mergeL - the left merge.
Let us pretend that this function is called as so:
merge([1, 2, 3], [a, b, c])
Two lists of the same length. This first function then calls mergeL:
mergeL([X|Xs],Ys,Zs) -> mergeR(Xs,Ys,[X|Zs]);
mergeL([],[],Zs) -> Zs.
There are 2 clauses in left merge. The call to mergeL with arguments will match these clauses in top down order.
The second of these clauses has three parameters - the first two of these are empty lists []. However the first time mergeL is called these two lists aren't empty they are the lists Xs and Ys so the first clause matches.
Lets break out the matches. This is the call to mergeL:
mergeL([1, 2, 3], [a, b, c], [])
and it matches the first clause in the following fashion:
X = 1
Xs = [2, 3]
Ys = [a, b, c]
Zs = []
This is because of the special form of the list:
[X | Xs]
This means match X to the head of the list (an individual item) and make Xs the tail of the list (a list).
We then build up the new function call. We can add the value X to the start of the list Zs the same way we pattern matched it out so we get the first mergeR call:
mergeR([2, 3], [a, b, c], [1])
The final argument is a one-item list caused by adding an item at the head of an empty list.
This this zips through until the end.
Actually the final clause of mergeL is redundant. By definition this function will exhaust in the final clause of mergeR (but I will leave that as an exercise for the reader).
What the example does is define a few states that the recursion will go through. There are 3 'functions' that are defined:
merge, mergeL and mergeR.
The lists to merge are Xs and Ys, whereas the Zs are the result of the merge.
The merge will start with calling 'merge' and supplying two lists. The first step is to call mergeL with the two lists to merge, and an empty resultset.
[X|Xs] takes the first element of the list (very much like array_shift would). This element is added to the head of the resultset ([X|Zs] does this). This resultset (containing one element now) is then passed to the next call, mergeR. mergeR does the same thing, only it takes an element from the second list. This behaviour will continue as long as the lists fed to mergeL or mergeR are not empty.
When mergeL or mergeR is called with two empty lists ([]) and a resultset (Zs), it will return the resultset (and not do another run, thus stopping the recursion).
Summary:
The start of the recursion is the first line, which defines 'merge'. This start will set the whole thing in motion by calling the first mergeL.
The body of the recursion is lines 2 and 4, which define the behaviour or mergeL and mergeR, which both call each other.
The stop of the recursion is defined by lines 3 and 5, which basicly tell the whole thing what to do when there are no more elements in the array.
Hope this helps!
I always look for those functions that will terminate the recursion first, in this case:
mergeL([],[],Zs) -> Zs.
and
mergeR([],[],Zs) -> Zs.
both of those will basically finish the "merging" when the first two parameters are empty lists.
So then I look at the first call of the function:
merge(Xs,Ys) -> lists:reverse(mergeL(Xs,Ys,[])).
Ignoring the reverse for a second, you will see that the last parameter is an empty list. So I'd expect the various mergeL and mergeR functions to move the elements of that array into the final parameter - and when they are all moved the function will basically terminate (although finally calling the reverse function of course)
And that is exactly what the remaining functions do:
mergeL([X|Xs],Ys,Zs) -> mergeR(Xs,Ys,[X|Zs]);
takes the first element of X and puts it into the Z array, and
mergeR(Xs,[Y|Ys],Zs) -> mergeL(Xs,Ys,[Y|Zs]);
takes the first element of Y and puts it into the Z array. The calling of the mergeR from mergeL and vice versa is doing the interleave part.
What's interesting to see (and easy to fix) is that the arrays X and Y must be of the same length or you'll end up calling mergeL or mergeR with an empty array in X or Y - and that won't match either [ X | Xs] or [ Y | Ys].
And the reason for the reverse is simply around the relative efficiency of [ X | Zs] vs [ Zs | X]. The former is much more efficient.