When accumulating a collection (just collection, not list) of values into a single value, there are two options.
reduce(). Which takes a List<T>, and a function (T, T) -> T, and applies that function iteratively until the whole list is reduced into a single value.
fold(). Which takes a List<T>, an initial value V, and a function (V, T) -> V, and applies that function iteratively until the whole list is folded into a single value.
I know that both of them have their own use cases. For eg, reduce() can be used to find maximum value in a list and fold() can be used to find sum of all values in a list.
But, in that example, instead of using fold(), you can add(0), and then reduce(). Another use case of fold is to join all elements into a string. But this can also be done without using fold, by map |> toString() followed by reduce().
Just out of curiosity, the question is, can every use case of fold() be avoided given functions map(), filter(), reduce() and add()? (also remove() if required.)
It's the other way around. reduce(L,f) = fold(first(L), rest(L), f), so there's no special need for reduce -- it's just a short form for a common fold pattern.
fold has lots of use cases of its own, though.
The example you gave for string concatenation is one of them -- you can fold items into a special string accumulator much more efficiently than you can build strings by incremental accumulation. (exactly how depends on the language, but it's true pretty much everywhere).
Applying a list of incremental changes to a target object is a pretty common pattern. Adding files to a folder, drawing shapes on a canvas, turning a list into a set, crossing off completed items in a to-do list, etc., are all examples of this pattern.
Also map(L,f) = fold(newMap(), L, (M,v) -> add(M,f(v)), so map is also just a common fold pattern. Similarly, filter(L,f) = fold(newList(), L, (L,v) -> f(v) ? add(L,v) : L).
Related
If we have two lists l1 and l2 and we want to concatenate them we can use # or append which is in O(n1) where n1 is the length of l1. Or we can use rev_append which is according to the doc:
equivalent to List.rev l1 # l2, but rev_append is tail-recursive and more efficient.
So is rev_append more efficient than # or is it more efficient than List.rev + #? And is it better to use it instead of # and append when we don't care about the order?
OCaml lists are immutable. The second list doesn't need to be changed, but the first list has to be copied so the copy can point to the second list. Hence you're going to have to traverse the first list somehow. Nothing you can do will change the big-O time complexity of the append.
Since you can only add new elements at the beginning of a list, you need to traverse the first list in reverse order if you want the result to preserve the order of the first list.
The most obvious way to do this is to call recursively until you're at the end of the first list, then do the prefixing as you return from each recursive call. However this isn't tail-recursive. I.e., it will consume stack space proportional to the length of the first list. When the first list is long, you can run out of stack space (aka stack overflow).
This is the way that # works. It takes time and stack space proportional to the length of the first list.
Another idea is to give up on maintaining the order of the first list. If you prefix the first list in reverse order, you can can easily make the operation tail recursive. That's the purpose of List.rev_append. It takes constant stack space.
If you want to maintain the original list orders, but also use constant stack space you can reverse the first list (with List.rev), then use List.rev_append.
Plain List.rev_append is faster than # because it doesn't have to make internal function calls--it can just be a loop. It's also obviously faster than List.rev plus List.rev_append.
In summary if you don't care about the final order, then List.rev_append is faster than #, yes. Also it won't overflow the stack. It's not going to be a gigantic amount faster because the time complexity is basically the same.
Given code that looks something like this (pseudocode):
f(args)
result = simple_case
if (base_case(args))
return result
new_args = args
for each x in args list
remove x from new_args
if (need_to_update_result_based_on_x(result,x))
result = f(new_args)
return result
I already know how to remove tail-recursion from a function, but I am unsure if there are straightforward techniques for removing recursion that itself is inside of a loop, as above.
In particular, I am especially wondering if it is it possible to rewrite it so that it is purely iterative? By which I mean that it does not even need to effectively emulate the recursive design by simply implementing it's own stack? If not, what would generally be the most economical (in terms of storage and time) way to rewrite such a function?
The args list is being reduced by each iteration for the for loop in the function, and the reduced list is also passed to the next recursive call. So, when the recursive call returns, the loop picks up with the list that it had when it made the recursive call, but the recursive call itself has already worked out results for the reduced versions of that list.
To avoid redoing work already done, you can cache the results of the previous recursive calls so that future recursive calls need not redo the complete computation. This technique is often referred to as memoization.
So, to remove the recursion altogether, the base case results can be computed upfront and remembered (memoized). These memoizations can be used to compute results for the base case plus one of the other arguments in the list, and these results also memoized (at this point, the base case results can be tossed). You can repeatedly compute memoizations for each successive larger list size until you achieve a result for the desired argument list.
This was a question on a sample exam I did.
Give the definition of a Prolog predicate split_into_pairs that takes as arguments a list and returns as a result a list which consists of paired elements. For example, split_into_pairs([1,2,3,4,5,6],X) would return as a result X=[[1,2],[3,4],[5,6]]. Similarly, split_into_pairs([a,2,3,4,a,a,a,a],X) would return as result X=[[a,2],[3,4],[a,a],[a,a]] while split_into_pairs([1,2,3],X) would return No.
It's not meant to be done using built-in predicates I believe, but it shouldn't need to be too complicated either as it was only worth 8/120 marks.
I'm not sure what it should do for a list of two elements, so I guess that would either be not specified so that it returns no, or split_into_pairs([A,B],[[A,B]]).
My main issue is how to do the recursive call properly, without having extra brackets, not ending up as something like X=[[A,B],[[C,D],[[E,F]]]]?.
My most recent attempts have been variations of the code below, but obviously this is incorrect.
split_into_pairs([A,B],[A,B])
split_into_pairs([A,B|T], X) :- split_into_pairs(T, XX), X is [A,B|XX]
This is a relatively straightforward recursion:
split_into_pairs([], []).
split_into_pairs([First, Second | Tail], [[First, Second] | Rest]) :-
split_into_pairs(Tail, Rest).
The first rule says that an empty list is already split into pairs; the second requires that the source list has at least two items, pairs them up, and inserts the result of pairing up the tail list behind them.
Here is a demo on ideone.
Your solution could be fixed as well by adding square brackets in the result, and moving the second part of the rule into the header, like this:
split_into_pairs([A,B],[[A,B]]).
split_into_pairs([A,B|T], [[A,B]|XX]) :- split_into_pairs(T, XX).
Note that this solution does not consider an empty list a list of pairs, so split_into_pairs([], X) would fail.
Your code is almost correct. It has obvious syntax issues, and several substantive issues:
split_into_pairs([A,B], [ [ A,B ] ] ):- !.
split_into_pairs([A,B|T], X) :- split_into_pairs(T, XX),
X = [ [ A,B ] | XX ] .
Now it is correct: = is used instead of is (which is normally used with arithmetic operations), both clauses are properly terminated by dots, and the first one has a cut added into it, to make the predicate deterministic, to produce only one result. The correct structure is produced by enclosing each pair of elements into a list of their own, with brackets.
This is inefficient though, because it describes a recursive process - it constructs the result on the way back from the base case.
The efficient definition works on the way forward from the starting case:
split_into_pairs([A,B],[[A,B]]):- !.
split_into_pairs([A,B|T], X) :- X = [[A,B]|XX], split_into_pairs(T, XX).
This is the essence of tail recursion modulo cons optimization technique, which turns recursive processes into iterative ones - such that are able to run in constant stack space. It is very similar to the tail-recursion with accumulator technique.
The cut had to be introduced because the two clauses are not mutually exclusive: a term unifying with [A,B] could also be unifiable with [A,B|T], in case T=[]. We can get rid of the cut by making the two clauses to be mutually-exclusive:
split_into_pairs([], [] ).
split_into_pairs([A,B|T], [[A,B]|XX]):- split_into_pairs(T, XX).
As follow up to yesterday's question Erlang: choosing unique items from a list, using recursion
In Erlang, say I wanted choose all unique items from a given list, e.g.
List = [foo, bar, buzz, foo].
and I had used your code examples resulting in
NewList = [bar, buzz].
How would I further manipulate NewList in Erlang?
For example, say I not only wanted to choose all unique items from List, but also count the total number of characters of all resulting items from NewList?
In functional programming we have patterns that occur so frequently they deserve their own names and support functions. Two of the most widely used ones are map and fold (sometimes reduce). These two form basic building blocks for list manipulation, often obviating the need to write dedicated recursive functions.
Map
The map function iterates over a list in order, generating a new list where each element is the result of applying a function to the corresponding element in the original list. Here's how a typical map might be implemented:
map(Fun, [H|T]) -> % recursive case
[Fun(H)|map(Fun, T)];
map(_Fun, []) -> % base case
[].
This is a perfect introductory example to recursive functions; roughly speaking, the function clauses are either recursive cases (result in a call to iself with a smaller problem instance) or base cases (no recursive calls made).
So how do you use map? Notice that the first argument, Fun, is supposed to be a function. In Erlang, it's possible to declare anonymous functions (sometimes called lambdas) inline. For example, to square each number in a list, generating a list of squares:
map(fun(X) -> X*X end, [1,2,3]). % => [1,4,9]
This is an example of Higher-order programming.
Note that map is part of the Erlang standard library as lists:map/2.
Fold
Whereas map creates a 1:1 element mapping between one list and another, the purpose of fold is to apply some function to each element of a list while accumulating a single result, such as a sum. The right fold (it helps to think of it as "going to the right") might look like so:
foldr(Fun, Acc, [H|T]) -> % recursive case
foldr(Fun, Fun(H, Acc), T);
foldr(_Fun, Acc, []) -> % base case
Acc.
Using this function, we can sum the elements of a list:
foldr(fun(X, Sum) -> Sum + X, 0, [1,2,3,4,5]). %% => 15
Note that foldr and foldl are both part of the Erlang standard library, in the lists module.
While it may not be immediately obvious, a very large class of common list-manipulation problems can be solved using map and fold alone.
Thinking recursively
Writing recursive algorithms might seem daunting at first, but as you get used to it, it turns out to be quite natural. When encountering a problem, you should identify two things:
How can I decompose the problem into smaller instances? In order for recursion to be useful, the recursive call must take a smaller problem as its argument, or the function will never terminate.
What's the base case, i.e. the termination criterion?
As for 1), consider the problem of counting the elements of a list. How could this possibly be decomposed into smaller subproblems? Well, think of it this way: Given a non-empty list whose first element (head) is X and whose remainder (tail) is Y, its length is 1 + the length of Y. Since Y is smaller than the list [X|Y], we've successfully reduced the problem.
Continuing the list example, when do we stop? Well, eventually, the tail will be empty. We fall back to the base case, which is the definition that the length of the empty list is zero. You'll find that writing function clauses for the various cases is very much like writing definitions for a dictionary:
%% Definition:
%% The length of a list whose head is H and whose tail is T is
%% 1 + the length of T.
length([H|T]) ->
1 + length(T);
%% Definition: The length of the empty list ([]) is zero.
length([]) ->
0.
You could use a fold to recurse over the resulting list. For simplicity I turned your atoms into strings (you could do this with list_to_atom/1):
1> NewList = ["bar", "buzz"].
["bar","buzz"]
2> L = lists:foldl(fun (W, Acc) -> [{W, length(W)}|Acc] end, [], NewList).
[{"buzz",4},{"bar",3}]
This returns a proplist you can access like so:
3> proplists:get_value("buzz", L).
4
If you want to build the recursion yourself for didactic purposes instead of using lists:
count_char_in_list([], Count) ->
Count;
count_char_in_list([Head | Tail], Count) ->
count_char_in_list(Tail, Count + length(Head)). % a string is just a list of numbers
And then:
1> test:count_char_in_list(["bar", "buzz"], 0).
7
I'm new to OCaml, and I'd like to implement Gaussian Elimination as an exercise. I can easily do it with a stateful algorithm, meaning keep a matrix in memory and recursively operating on it by passing around a reference to it.
This statefulness, however, smacks of imperative programming. I know there are capabilities in OCaml to do this, but I'd like to ask if there is some clever functional way I haven't thought of first.
OCaml arrays are mutable, and it's hard to avoid treating them just like arrays in an imperative language.
Haskell has immutable arrays, but from my (limited) experience with Haskell, you end up switching to monadic, mutable arrays in most cases. Immutable arrays are probably amazing for certain specific purposes. I've always imagined you could write a beautiful implementation of dynamic programming in Haskell, where the dependencies among array entries are defined entirely by the expressions in them. The key is that you really only need to specify the contents of each array entry one time. I don't think Gaussian elimination follows this pattern, and so it seems it might not be a good fit for immutable arrays. It would be interesting to see how it works out, however.
You can use a Map to emulate a matrix. The key would be a pair of integers referencing the row and column. You'll want to use your own get x y function to ensure x < n and y < n though, instead of accessing the Map directly. (edit) You can use the compare function in Pervasives directly.
module OrderedPairs = struct
type t = int * int
let compare = Pervasives.compare
end
module Pairs = Map.Make (OrderedPairs)
let get_ n set x y =
assert( x < n && y < n );
Pairs.find (x,y) set
let set_ n set x y v =
assert( x < n && y < n );
Pairs.add (x,y) set v
Actually, having a general set of functions (get x y and set x y at a minimum), without specifying the implementation, would be an even better option. The functions then can be passed to the function, or be implemented in a module through a functor (a better solution, but having a set of functions just doing what you need would be a first step since you're new to OCaml). In this way you can use a Map, Array, Hashtbl, or a set of functions to access a file on the hard-drive to implement the matrix if you wanted. This is the really important aspect of functional programming; that you trust the interface over exploiting the side-effects, and not worry about the underlying implementation --since it's presumed to be pure.
The answers so far are using/emulating mutable data-types, but what does a functional approach look like?
To see, let's decompose the problem into some functional components:
Gaussian elimination involves a sequence of row operations, so it is useful first to define a function taking 2 rows and scaling factors, and returning the resultant row operation result.
The row operations we want should eliminate a variable (column) from a particular row, so lets define a function which takes a pair of rows and a column index and uses the previously defined row operation to return the modified row with that column entry zero.
Then we define two functions, one to convert a matrix into triangular form, and another to back-substitute a triangular matrix to the diagonal form (using the previously defined functions) by eliminating each column in turn. We could iterate or recurse over the columns, and the matrix could be defined as a list, vector or array of lists, vectors or arrays. The input is not changed, but a modified matrix is returned, so we can finally do:
let out_matrix = to_diagonal (to_triangular in_matrix);
What makes it functional is not whether the data-types (array or list) are mutable, but how they they are used. This approach may not be particularly 'clever' or be the most efficient way to do Gaussian eliminations in OCaml, but using pure functions lets you express the algorithm cleanly.