Time complexity of multiple string concatenation (join) in functional programming languages - functional-programming

Am I right that the only algorithm that can be implemented in functional programming languages like Haskell to concatenate multiple strings (i.e. implement join that transforms list of lines ["one", "two", "three"] into one line "onetwothree") has time complexity of order O(n^2), like described in this well-known post?
E.g. if I work with immutable strings, for example, in Python, and try to implement join, I'll get something like
def myjoin(list_of_strings):
return list_of_strings[0] + myjoin(list_of_strings[1:])
Is it true that it is not possible to make it faster, for example, in Haskell?

First of all Haskell is lazily: this means that if you write:
concat ["foo", "bar", "qux"]
it will not perform this operation until you request for instance the first character of the outcome. In that case usually it will not concatenate all strings together, but - depending on how smart the function is implemented - aim to do the minimal amount of work necessary to obtain the first character. If you request the first character, but do not inspect it, it could be possible that you for instance got succ 'f' instead of 'g' since again Haskell is lazy.
But let's assume that we are interested in the resulting string, and want to know every character. We can implement concat as:
concat :: [[a]] -> [a]
concat [] = []
concat (x:xs) = x ++ concat xs
and (++) as:
(++) :: [a] -> [a] -> [a]
(++) [] ys = ys
(++) (x:xs) ys = x : (xs ++ ys)
Now that means that - given (:) works in O(1) - (++) works in O(a) with a the length of the first list, and b (note that this is not in the big oh expression) the length of the second list.
So now if we inspect concat, we see that if we enter k strings, we will perform k (++) operations. At every (++) operation, the left string is equal to the length of the string. So that means that if the sum of the lengths of the strings is n, concat is an O(n) algorithm.

Related

What are practical examples of the higher-order functions foldl and foldr?

The typical academic example is to sum a list.
Are there real world examples of the use of fold that will shed light on its utility ?
fold is perhaps the most fundamental operation on sequences. Asking for its utility is like asking for the utility of a for loop in an imperative language.
Given a list (or array, or tree, or ..), a starting value, and a function, the fold operator reduces the list to a single result. It is also the natural catamorphism (destructor) for lists.
Any operations that take a list as input, and produce an output after inspecting the elements of the list can be encoded as folds. E.g.
sum = fold (+) 0
length = fold (λx n → 1 + n) 0
reverse = fold (λx xs → xs ++ [x]) []
map f = fold (λx ys → f x : ys) []
filter p = fold (λx xs → if p x then x : xs else xs) []
The fold operator is not specific to lists, but can be generalised in a uniform way to ‘regular’ datatypes.
So, as one of the most fundamental operations on a wide variety of data types, it certainly does have some use out there. Being able to recognize when an algorithm can be described as a fold is a useful skill that will lead to cleaner code.
References:
A tutorial on the universality and expressiveness of fold
Writing foldl in terms of foldr
On folds
Lots And Lots Of foldLeft Examples lists the following functions:
sum
product
count
average
last
penultimate
contains
get
to string
reverse
unique
to set
double
insertion sort
pivot (part of quicksort)
encode (count consecutive elements)
decode (generate consecutive elements)
group (into sublists of even sizes)
My lame answer is that:
foldr is for reducing the problem to the primitive case and then assembling back up (behaves as a non tail-recursion)
foldl is for reducing the problem and assembling the solution at every step, where at the primitive case you have the solution ready (bahaves as a tail recursion / iteration)
This question reminded me immediately of a talk by Ralf Lämmel Going Bananas (as the rfold operator notation looks like a banana (| and |)). There are quite illustrative examples of mapping recursion to folds and even one fold to the other.
The classic paper (that is quite difficult at first) is Functional Programming with Bananas, Lenses,. Envelopes and Barbed Wire named after the look of other operators.

Miranda going through lists

is there an easy way to go through a list?
lets say i wanted to access the 5th data on the list not knowing it was a B
["A","A","A","A","B","A","A","A","A"]
is there a way i can do it without having to sort through the list?
I do not know Miranda that well, but I expect the functions skip and take are available.
you can address the 5th element by making a function out of skip and take. When skip and take are not available, it is easy to create them yourself.
skip: skips the y number of elements in a list, when y is greater than the number of items in the list, it will return an empty list
take: takes the first y number of elements in a list, when y is greater than the number of items in the list, the full list will be returned.
skip y [] = []
skip 0 xs = xs
skip y (x:xs) = skip xs (y-1)
take y [] = []
take 0 xs = []
take y (x:xs) = x : take (y-1) xs
elementAt x xs = take 1 (skip x xs)
Lists are inductive datatypes. This means that functions defined over lists - for instance, accessing the nth element - are defined by recursion. The data structure you are looking for appears to be an array, which allows constant time lookup. The easiest way to find the element at an index in a list is directly:
lookup :: Int -> [a] -> Maybe [a]
lookup n [] = Nothing
lookup 0 (x:xs) = Just x
lookup n (x:xs) = lookup (n - 1) xs
Another way to do this would be to use the ! operator. Let's say you have a program with defined data in the list, such as:
plist = [A,A,A,A,B,A,A,A,A]
then executing plist!4 will give you the 5th element of that list. (4 being the 5th unit if you include 0,1,2,3,4)
So plist!4 returns B.
Lists are not arrays.
You can only access elements beginning from first. Think of lists as streams (like a song playing in radio). Lists may be of infinite length (as radio never stops).
Most programmers uses "syntactic" sugar, which hides the nature of lists behind an easier syntax.
Miranda automatically loads a default library named stdenv.m, which you can study.
Now, let's think about your problem:
You want to ignore ("drop") all elements before the 5th and then get the first element from the rest of the ramaining list.
This is expressed in Miranda as:
nth :: num -> [*] -> *
nth n = hd . drop (n-1)
This is a function with explicit type declaration to see, that function works with every list (elements are of wildcard type *).
Sample:
plist :: [[char]]
plist = ["A","A","A","A","B","A","A","A","A"]
result :: [char]
result = nth 5 plist
If you want to code your functions with error handling, you need techniques to catch that there is no 5th element in your list.
As seen above, one technique is "Maybe". Another is continuations.
A bad technique is to check the length of list first, because this will crash with infinite lists.

Confused over behavior of List.mapi in F#

I am building some equations in F#, and when working on my polynomial class I found some odd behavior using List.mapi
Basically, each polynomial has an array, so 3*x^2 + 5*x + 6 would be [|6, 5, 3|] in the array, so, when adding polynomials, if one array is longer than the other, then I just need to append the extra elements to the result, and that is where I ran into a problem.
Later I want to generalize it to not always use a float, but that will be after I get more working.
So, the problem is that I expected List.mapi to return a List not individual elements, but, in order to put the lists together I had to put [] around my use of mapi, and I am curious why that is the case.
This is more complicated than I expected, I thought I should be able to just tell it to make a new List starting at a certain index, but I can't find any function for that.
type Polynomial() =
let mutable coefficients:float [] = Array.empty
member self.Coefficients with get() = coefficients
static member (+) (v1:Polynomial, v2:Polynomial) =
let ret = List.map2(fun c p -> c + p) (List.ofArray v1.Coefficients) (List.ofArray v2.Coefficients)
let a = List.mapi(fun i x -> x)
match v1.Coefficients.Length - v2.Coefficients.Length with
| x when x < 0 ->
ret :: [((List.ofArray v1.Coefficients) |> a)]
| x when x > 0 ->
ret :: [((List.ofArray v2.Coefficients) |> a)]
| _ -> [ret]
I think that a straightforward implementation using lists and recursion would be simpler in this case. An alternative implementation of the Polynomial class might look roughly like this:
// The type is immutable and takes initial list as constructor argument
type Polynomial(coeffs:float list) =
// Local recursive function implementing the addition using lists
let rec add l1 l2 =
match l1, l2 with
| x::xs, y::ys -> (x+y) :: (add xs ys)
| rest, [] | [], rest -> rest
member self.Coefficients = coeffs
static member (+) (v1:Polynomial, v2:Polynomial) =
// Add lists using local function
let newList = add v1.Coefficients v2.Coefficients
// Wrap result into new polynomial
Polynomial(newList)
It is worth noting that you don't really need mutable field in the class, since the + operator creates and returns a new instance of the type, so the type is fully immutable (as you'd usually want in F#).
The nice thing in the add function is that after processing all elements that are available in both lists, you can simply return the tail of the non-empty list as the rest.
If you wanted to implement the same functionality using arrays, then it may be better to use a simple for loop (since arrays are, in principle, imperative, the usual imperative patterns are usually the best option for dealing with them). However, I don't think there is any particular reason for preferring arrays (maybe performance, but that would have to be evaluated later during the development).
As Pavel points out, :: operator appends a single element to the front of a list (see the add function above, which demonstrates that). You could write what you wanted using # which concatenates lists, or using Array.concat (which concatenates a sequence of arrays).
An implementation using higher-order functions and arrays is also possible - the best version I can come up with would look like this:
let add (a1:_[]) (a2:_[]) =
// Add parts where both arrays have elements
let l = min a1.Length a2.Length
let both = Array.map2 (+) a1.[0 .. l-1] a2.[0 .. l-1]
// Take the rest of the longer array
let rest =
if a1.Length > a2.Length
then a1.[l .. a1.Length - 1]
else a2.[l .. a2.Length - 1]
// Concatenate them
Array.concat [ both; rest ]
add [| 6; 5; 3 |] [| 7 |]
This uses slices (e.g. a.[0 .. l]) which give you a part of an array - you can use these to take the parts where both arrays have elements and the remaining part of the longer array.
I think you're misunderstanding what operator :: does. It's not used to concatenate two lists. It's used to prepend a single element to the list. Consequently, it's type is:
'a -> 'a list -> 'a list
In your case, you're giving ret as a first argument, and ret is itself a float list. Consequently, it expects the second argument to be of type float list list - hence why you need to add an extra [] to the second argument to make it to compile - and that will also be the result type of your operator +, which is probably not what you want.
You can use List.concat to concatenate two (or more) lists, but that is inefficient. In your example, I don't see the point of using lists at all - all this converting back & forth is going to be costly. For arrays, you can use Array.append, which is better.
By the way, it's not clear what is the purpose of mapi in your code at all. It's exactly the same as map, except for the index argument, but you're not using it, and your mapping is the identity function, so it's effectively a no-op. What's it about?

Please walk me through this "Erlang Programming" recursive sample

From page 90 of Erlang Programming by Cesarini and Thomson, there is an example that has no detailed discussion. I'm quite the newbie to functional programming and recursive thinking, so I'm not familiar in solving problems this way.
"For example, the following function merges two lists (of the same length) by interleaving
their values: "
merge(Xs,Ys) -> lists:reverse(mergeL(Xs,Ys,[])).
mergeL([X|Xs],Ys,Zs) -> mergeR(Xs,Ys,[X|Zs]);
mergeL([],[],Zs) -> Zs.
mergeR(Xs,[Y|Ys],Zs) -> mergeL(Xs,Ys,[Y|Zs]);
mergeR([],[],Zs) -> Zs.
How does this work? Thanks!
step through it
merge([1,2],[3,4])
reverse(mergeL([1,2],[3,4],[]))
reverse(mergeR([2],[3,4],[1]))
reverse(mergeL([2],[4],[3,1]))
reverse(mergeR([], [4], [2,3,1]))
reverse(mergeL([], [], [4,2,3,1]))
reverse([4,2,3,1])
[1,3,2,4]
It's always good to work these functions by hand on a piece of paper with a small input where you're trying to figure it. You'll quickly see how it works.
This function is called first:
merge(Xs,Ys) -> lists:reverse(mergeL(Xs,Ys,[])).
The empty list [] passed to mergeL is the accumulator - this is where the answer will come from. Note that the first function calls mergeL - the left merge.
Let us pretend that this function is called as so:
merge([1, 2, 3], [a, b, c])
Two lists of the same length. This first function then calls mergeL:
mergeL([X|Xs],Ys,Zs) -> mergeR(Xs,Ys,[X|Zs]);
mergeL([],[],Zs) -> Zs.
There are 2 clauses in left merge. The call to mergeL with arguments will match these clauses in top down order.
The second of these clauses has three parameters - the first two of these are empty lists []. However the first time mergeL is called these two lists aren't empty they are the lists Xs and Ys so the first clause matches.
Lets break out the matches. This is the call to mergeL:
mergeL([1, 2, 3], [a, b, c], [])
and it matches the first clause in the following fashion:
X = 1
Xs = [2, 3]
Ys = [a, b, c]
Zs = []
This is because of the special form of the list:
[X | Xs]
This means match X to the head of the list (an individual item) and make Xs the tail of the list (a list).
We then build up the new function call. We can add the value X to the start of the list Zs the same way we pattern matched it out so we get the first mergeR call:
mergeR([2, 3], [a, b, c], [1])
The final argument is a one-item list caused by adding an item at the head of an empty list.
This this zips through until the end.
Actually the final clause of mergeL is redundant. By definition this function will exhaust in the final clause of mergeR (but I will leave that as an exercise for the reader).
What the example does is define a few states that the recursion will go through. There are 3 'functions' that are defined:
merge, mergeL and mergeR.
The lists to merge are Xs and Ys, whereas the Zs are the result of the merge.
The merge will start with calling 'merge' and supplying two lists. The first step is to call mergeL with the two lists to merge, and an empty resultset.
[X|Xs] takes the first element of the list (very much like array_shift would). This element is added to the head of the resultset ([X|Zs] does this). This resultset (containing one element now) is then passed to the next call, mergeR. mergeR does the same thing, only it takes an element from the second list. This behaviour will continue as long as the lists fed to mergeL or mergeR are not empty.
When mergeL or mergeR is called with two empty lists ([]) and a resultset (Zs), it will return the resultset (and not do another run, thus stopping the recursion).
Summary:
The start of the recursion is the first line, which defines 'merge'. This start will set the whole thing in motion by calling the first mergeL.
The body of the recursion is lines 2 and 4, which define the behaviour or mergeL and mergeR, which both call each other.
The stop of the recursion is defined by lines 3 and 5, which basicly tell the whole thing what to do when there are no more elements in the array.
Hope this helps!
I always look for those functions that will terminate the recursion first, in this case:
mergeL([],[],Zs) -> Zs.
and
mergeR([],[],Zs) -> Zs.
both of those will basically finish the "merging" when the first two parameters are empty lists.
So then I look at the first call of the function:
merge(Xs,Ys) -> lists:reverse(mergeL(Xs,Ys,[])).
Ignoring the reverse for a second, you will see that the last parameter is an empty list. So I'd expect the various mergeL and mergeR functions to move the elements of that array into the final parameter - and when they are all moved the function will basically terminate (although finally calling the reverse function of course)
And that is exactly what the remaining functions do:
mergeL([X|Xs],Ys,Zs) -> mergeR(Xs,Ys,[X|Zs]);
takes the first element of X and puts it into the Z array, and
mergeR(Xs,[Y|Ys],Zs) -> mergeL(Xs,Ys,[Y|Zs]);
takes the first element of Y and puts it into the Z array. The calling of the mergeR from mergeL and vice versa is doing the interleave part.
What's interesting to see (and easy to fix) is that the arrays X and Y must be of the same length or you'll end up calling mergeL or mergeR with an empty array in X or Y - and that won't match either [ X | Xs] or [ Y | Ys].
And the reason for the reverse is simply around the relative efficiency of [ X | Zs] vs [ Zs | X]. The former is much more efficient.

Implications of foldr vs. foldl (or foldl')

Firstly, Real World Haskell, which I am reading, says to never use foldl and instead use foldl'. So I trust it.
But I'm hazy on when to use foldr vs. foldl'. Though I can see the structure of how they work differently laid out in front of me, I'm too stupid to understand when "which is better." I guess it seems to me like it shouldn't really matter which is used, as they both produce the same answer (don't they?). In fact, my previous experience with this construct is from Ruby's inject and Clojure's reduce, which don't seem to have "left" and "right" versions. (Side question: which version do they use?)
Any insight that can help a smarts-challenged sort like me would be much appreciated!
The recursion for foldr f x ys where ys = [y1,y2,...,yk] looks like
f y1 (f y2 (... (f yk x) ...))
whereas the recursion for foldl f x ys looks like
f (... (f (f x y1) y2) ...) yk
An important difference here is that if the result of f x y can be computed using only the value of x, then foldr doesn't' need to examine the entire list. For example
foldr (&&) False (repeat False)
returns False whereas
foldl (&&) False (repeat False)
never terminates. (Note: repeat False creates an infinite list where every element is False.)
On the other hand, foldl' is tail recursive and strict. If you know that you'll have to traverse the whole list no matter what (e.g., summing the numbers in a list), then foldl' is more space- (and probably time-) efficient than foldr.
foldr looks like this:
foldl looks like this:
Context: Fold on the Haskell wiki
Their semantics differ so you can't just interchange foldl and foldr. The one folds the elements up from the left, the other from the right. That way, the operator gets applied in a different order. This matters for all non-associative operations, such as subtraction.
Haskell.org has an interesting article on the subject.
Shortly, foldr is better when the accumulator function is lazy on its second argument. Read more at Haskell wiki's Stack Overflow (pun intended).
The reason foldl' is preferred to foldl for 99% of all uses is that it can run in constant space for most uses.
Take the function sum = foldl['] (+) 0. When foldl' is used, the sum is immediately calculated, so applying sum to an infinite list will just run forever, and most likely in constant space (if you’re using things like Ints, Doubles, Floats. Integers will use more than constant space if the number becomes larger than maxBound :: Int).
With foldl, a thunk is built up (like a recipe of how to get the answer, which can be evaluated later, rather than storing the answer). These thunks can take up a lot of space, and in this case, it’s much better to evaluate the expression than to store the thunk (leading to a stack overflow… and leading you to… oh never mind)
Hope that helps.
By the way, Ruby's inject and Clojure's reduce are foldl (or foldl1, depending on which version you use). Usually, when there is only one form in a language, it is a left fold, including Python's reduce, Perl's List::Util::reduce, C++'s accumulate, C#'s Aggregate, Smalltalk's inject:into:, PHP's array_reduce, Mathematica's Fold, etc. Common Lisp's reduce defaults to left fold but there's an option for right fold.
As Konrad points out, their semantics are different. They don't even have the same type:
ghci> :t foldr
foldr :: (a -> b -> b) -> b -> [a] -> b
ghci> :t foldl
foldl :: (a -> b -> a) -> a -> [b] -> a
ghci>
For example, the list append operator (++) can be implemented with foldr as
(++) = flip (foldr (:))
while
(++) = flip (foldl (:))
will give you a type error.

Resources