Implications of foldr vs. foldl (or foldl') - recursion

Firstly, Real World Haskell, which I am reading, says to never use foldl and instead use foldl'. So I trust it.
But I'm hazy on when to use foldr vs. foldl'. Though I can see the structure of how they work differently laid out in front of me, I'm too stupid to understand when "which is better." I guess it seems to me like it shouldn't really matter which is used, as they both produce the same answer (don't they?). In fact, my previous experience with this construct is from Ruby's inject and Clojure's reduce, which don't seem to have "left" and "right" versions. (Side question: which version do they use?)
Any insight that can help a smarts-challenged sort like me would be much appreciated!

The recursion for foldr f x ys where ys = [y1,y2,...,yk] looks like
f y1 (f y2 (... (f yk x) ...))
whereas the recursion for foldl f x ys looks like
f (... (f (f x y1) y2) ...) yk
An important difference here is that if the result of f x y can be computed using only the value of x, then foldr doesn't' need to examine the entire list. For example
foldr (&&) False (repeat False)
returns False whereas
foldl (&&) False (repeat False)
never terminates. (Note: repeat False creates an infinite list where every element is False.)
On the other hand, foldl' is tail recursive and strict. If you know that you'll have to traverse the whole list no matter what (e.g., summing the numbers in a list), then foldl' is more space- (and probably time-) efficient than foldr.

foldr looks like this:
foldl looks like this:
Context: Fold on the Haskell wiki

Their semantics differ so you can't just interchange foldl and foldr. The one folds the elements up from the left, the other from the right. That way, the operator gets applied in a different order. This matters for all non-associative operations, such as subtraction.
Haskell.org has an interesting article on the subject.

Shortly, foldr is better when the accumulator function is lazy on its second argument. Read more at Haskell wiki's Stack Overflow (pun intended).

The reason foldl' is preferred to foldl for 99% of all uses is that it can run in constant space for most uses.
Take the function sum = foldl['] (+) 0. When foldl' is used, the sum is immediately calculated, so applying sum to an infinite list will just run forever, and most likely in constant space (if you’re using things like Ints, Doubles, Floats. Integers will use more than constant space if the number becomes larger than maxBound :: Int).
With foldl, a thunk is built up (like a recipe of how to get the answer, which can be evaluated later, rather than storing the answer). These thunks can take up a lot of space, and in this case, it’s much better to evaluate the expression than to store the thunk (leading to a stack overflow… and leading you to… oh never mind)
Hope that helps.

By the way, Ruby's inject and Clojure's reduce are foldl (or foldl1, depending on which version you use). Usually, when there is only one form in a language, it is a left fold, including Python's reduce, Perl's List::Util::reduce, C++'s accumulate, C#'s Aggregate, Smalltalk's inject:into:, PHP's array_reduce, Mathematica's Fold, etc. Common Lisp's reduce defaults to left fold but there's an option for right fold.

As Konrad points out, their semantics are different. They don't even have the same type:
ghci> :t foldr
foldr :: (a -> b -> b) -> b -> [a] -> b
ghci> :t foldl
foldl :: (a -> b -> a) -> a -> [b] -> a
ghci>
For example, the list append operator (++) can be implemented with foldr as
(++) = flip (foldr (:))
while
(++) = flip (foldl (:))
will give you a type error.

Related

How does term-rewriting based evaluation work?

The Pure programming language is apparently based on term rewriting, instead of the lambda-calculus that traditionally underlies similar-looking languages.
...what qualitative, practical difference does this make? In fact, what is the difference in the way that it evaluates expressions?
The linked page provides a lot of examples of term rewriting being useful, but it doesn't actually describe what it does differently from function application, except that it has rather flexible pattern matching (and pattern matching as it appears in Haskell and ML is nice, but not fundamental to the evaluation strategy). Values are matched against the left side of a definition and substituted into the right side - isn't this just beta reduction?
The matching of patterns, and substitution into output expressions, superficially looks a bit like syntax-rules to me (or even the humble #define), but the main feature of that is obviously that it happens before rather than during evaluation, whereas Pure is fully dynamic and there is no obvious phase separation in its evaluation system (and in fact otherwise Lisp macro systems have always made a big noise about how they are not different from function application). Being able to manipulate symbolic expression values is cool'n'all, but also seems like an artifact of the dynamic type system rather than something core to the evaluation strategy (pretty sure you could overload operators in Scheme to work on symbolic values; in fact you can even do it in C++ with expression templates).
So what is the mechanical/operational difference between term rewriting (as used by Pure) and traditional function application, as the underlying model of evaluation, when substitution happens in both?
Term rewriting doesn't have to look anything like function application, but languages like Pure emphasise this style because a) beta-reduction is simple to define as a rewrite rule and b) functional programming is a well-understood paradigm.
A counter-example would be a blackboard or tuple-space paradigm, which term-rewriting is also well-suited for.
One practical difference between beta-reduction and full term-rewriting is that rewrite rules can operate on the definition of an expression, rather than just its value. This includes pattern-matching on reducible expressions:
-- Functional style
map f nil = nil
map f (cons x xs) = cons (f x) (map f xs)
-- Compose f and g before mapping, to prevent traversing xs twice
result = map (compose f g) xs
-- Term-rewriting style: spot double-maps before they're reduced
map f (map g xs) = map (compose f g) xs
map f nil = nil
map f (cons x xs) = cons (f x) (map f xs)
-- All double maps are now automatically fused
result = map f (map g xs)
Notice that we can do this with LISP macros (or C++ templates), since they are a term-rewriting system, but this style blurs LISP's crisp distinction between macros and functions.
CPP's #define isn't equivalent, since it's not safe or hygenic (sytactically-valid programs can become invalid after pre-processing).
We can also define ad-hoc clauses to existing functions as we need them, eg.
plus (times x y) (times x z) = times x (plus y z)
Another practical consideration is that rewrite rules must be confluent if we want deterministic results, ie. we get the same result regardless of which order we apply the rules in. No algorithm can check this for us (it's undecidable in general) and the search space is far too large for individual tests to tell us much. Instead we must convince ourselves that our system is confluent by some formal or informal proof; one way would be to follow systems which are already known to be confluent.
For example, beta-reduction is known to be confluent (via the Church-Rosser Theorem), so if we write all of our rules in the style of beta-reductions then we can be confident that our rules are confluent. Of course, that's exactly what functional programming languages do!

In pure functional languages, is data (strings, ints, floats.. ) also just functions?

I was thinking about pure Object Oriented Languages like Ruby, where everything, including numbers, int, floats, and strings are themselves objects. Is this the same thing with pure functional languages? For example, in Haskell, are Numbers and Strings also functions?
I know Haskell is based on lambda calculus which represents everything, including data and operations, as functions. It would seem logical to me that a "purely functional language" would model everything as a function, as well as keep with the definition that a function most always returns the same output with the same inputs and has no state.
It's okay to think about that theoretically, but...
Just like in Ruby not everything is an object (argument lists, for instance, are not objects), not everything in Haskell is a function.
For more reference, check out this neat post: http://conal.net/blog/posts/everything-is-a-function-in-haskell
#wrhall gives a good answer. However you are somewhat correct that in the pure lambda calculus it is consistent for everything to be a function, and the language is Turing-complete (capable of expressing any pure computation that Haskell, etc. is).
That gives you some very strange things, since the only thing you can do to anything is to apply it to something else. When do you ever get to observe something? You have some value f and want to know something about it, your only choice is to apply it some value x to get f x, which is another function and the only choice is to apply it to another value y, to get f x y and so on.
Often I interpret the pure lambda calculus as talking about transformations on things that are not functions, but only capable of expressing functions itself. That is, I can make a function (with a bit of Haskelly syntax sugar for recursion & let):
purePlus = \zero succ natCase ->
let plus = \m n -> natCase m n (\m' -> plus m' n)
in plus (succ (succ zero)) (succ (succ zero))
Here I have expressed the computation 2+2 without needing to know that there are such things as non-functions. I simply took what I needed as arguments to the function I was defining, and the values of those arguments could be church encodings or they could be "real" numbers (whatever that means) -- my definition does not care.
And you could think the same thing of Haskell. There is no particular reason to think that there are things which are not functions, nor is there a particular reason to think that everything is a function. But Haskell's type system at least prevents you from applying an argument to a number (anybody thinking about fromInteger right now needs to hold their tongue! :-). In the above interpretation, it is because numbers are not necessarily modeled as functions, so you can't necessarily apply arguments to them.
In case it isn't clear by now, this whole answer has been somewhat of a technical/philosophical digression, and the easy answer to your question is "no, not everything is a function in functional languages". Functions are the things you can apply arguments to, that's all.
The "pure" in "pure functional" refers to the "freedom from side effects" kind of purity. It has little relation to the meaning of "pure" being used when people talk about a "pure object-oriented language", which simply means that the language manipulates purely (only) in objects.
The reason is that pure-as-in-only is a reasonable distinction to use to classify object-oriented languages, because there are languages like Java and C++, which clearly have values that don't have all that much in common with objects, and there are also languages like Python and Ruby, for which it can be argued that every value is an object1
Whereas for functional languages, there are no practical languages which are "pure functional" in the sense that every value the language can manipulate is a function. It's certainly possible to program in such a language. The most basic versions of the lambda calculus don't have any notion of things that are not functions, but you can still do arbitrary computation with them by coming up with ways of representing the things you want to compute on as functions.2
But while the simplicity and minimalism of the lambda calculus tends to be great for proving things about programming, actually writing substantial programs in such a "raw" programming language is awkward. The function representation of basic things like numbers also tends to be very inefficient to implement on actual physical machines.
But there is a very important distinction between languages that encourage a functional style but allow untracked side effects anywhere, and ones that actually enforce that your functions are "pure" functions (similar to mathematical functions). Object-oriented programming is very strongly wed to the use of impure computations3, so there are no practical object-oriented programming languages that are pure in this sense.
So the "pure" in "pure functional language" means something very different from the "pure" in "pure object-oriented language".4 In each case the "pure vs not pure" distinction is one that is completely uninteresting applied to the other kind of language, so there's no very strong motive to standardise the use of the term.
1 There are corner cases to pick at in all "pure object-oriented" languages that I know of, but that's not really very interesting. It's clear that the object metaphor goes much further in languages in which 1 is an instance of some class, and that class can be sub-classed, than it does in languages in which 1 is something else than an object.
2 All computation is about representation anyway. Computers don't know anything about numbers or anything else. They just have bit-patterns that we use to represent numbers, and operations on bit-patterns that happen to correspond to operations on numbers (because we designed them so that they would).
3 This isn't fundamental either. You could design a "pure" object-oriented language that was pure in this sense. I tend to write most of my OO code to be pure anyway.
4 If this seems obtuse, you might reflect that the terms "functional", "object", and "language" have vastly different meanings in other contexts also.
A very different angle on this question: all sorts of data in Haskell can be represented as functions, using a technique called Church encodings. This is a form of inversion of control: instead of passing data to functions that consume it, you hide the data inside a set of closures, and to consume it you pass in callbacks describing what to do with this data.
Any program that uses lists, for example, can be translated into a program that uses functions instead of lists:
-- | A list corresponds to a function of this type:
type ChurchList a r = (a -> r -> r) --^ how to handle a cons cell
-> r --^ how to handle the empty list
-> r --^ result of processing the list
listToCPS :: [a] -> ChurchList a r
listToCPS xs = \f z -> foldr f z xs
That function is taking a concrete list as its starting point, but that's not necessary. You can build up ChurchList functions out of just pure functions:
-- | The empty 'ChurchList'.
nil :: ChurchList a r
nil = \f z -> z
-- | Add an element at the front of a 'ChurchList'.
cons :: a -> ChurchList a r -> ChurchList a r
cons x xs = \f z -> f z (xs f z)
foldChurchList :: (a -> r -> r) -> r -> ChurchList a r -> r
foldChurchList f z xs = xs f z
mapChurchList :: (a -> b) -> ChurchList a r -> ChurchList b r
mapChurchList f = foldChurchList step nil
where step x = cons (f x)
filterChurchList :: (a -> Bool) -> ChurchList a r -> ChurchList a r
filterChurchList pred = foldChurchList step nil
where step x xs = if pred x then cons x xs else xs
That last function uses Bool, but of course we can replace Bool with functions as well:
-- | A Bool can be represented as a function that chooses between two
-- given alternatives.
type ChurchBool r = r -> r -> r
true, false :: ChurchBool r
true a _ = a
false _ b = b
filterChurchList' :: (a -> ChurchBool r) -> ChurchList a r -> ChurchList a r
filterChurchList' pred = foldChurchList step nil
where step x xs = pred x (cons x xs) xs
This sort of transformation can be done for basically any type, so in theory, you could get rid of all "value" types in Haskell, and keep only the () type, the (->) and IO type constructors, return and >>= for IO, and a suitable set of IO primitives. This would obviously be hella impractical—and it would perform worse (try writing tailChurchList :: ChurchList a r -> ChurchList a r for a taste).
Is getChar :: IO Char a function or not? Haskell Report doesn't provide us with a definition. But it states that getChar is a function (see here). (Well, at least we can say that it is a function.)
So I think the answer is YES.
I don't think there can be correct definition of "function" except "everything is a function". (What is "correct definition"? Good question...) Consider the next example:
{-# LANGUAGE NoMonomorphismRestriction #-}
import Control.Applicative
f :: Applicative f => f Int
f = pure 1
g1 :: Maybe Int
g1 = f
g2 :: Int -> Int
g2 = f
Is f a function or datatype? It depends.

What are practical examples of the higher-order functions foldl and foldr?

The typical academic example is to sum a list.
Are there real world examples of the use of fold that will shed light on its utility ?
fold is perhaps the most fundamental operation on sequences. Asking for its utility is like asking for the utility of a for loop in an imperative language.
Given a list (or array, or tree, or ..), a starting value, and a function, the fold operator reduces the list to a single result. It is also the natural catamorphism (destructor) for lists.
Any operations that take a list as input, and produce an output after inspecting the elements of the list can be encoded as folds. E.g.
sum = fold (+) 0
length = fold (λx n → 1 + n) 0
reverse = fold (λx xs → xs ++ [x]) []
map f = fold (λx ys → f x : ys) []
filter p = fold (λx xs → if p x then x : xs else xs) []
The fold operator is not specific to lists, but can be generalised in a uniform way to ‘regular’ datatypes.
So, as one of the most fundamental operations on a wide variety of data types, it certainly does have some use out there. Being able to recognize when an algorithm can be described as a fold is a useful skill that will lead to cleaner code.
References:
A tutorial on the universality and expressiveness of fold
Writing foldl in terms of foldr
On folds
Lots And Lots Of foldLeft Examples lists the following functions:
sum
product
count
average
last
penultimate
contains
get
to string
reverse
unique
to set
double
insertion sort
pivot (part of quicksort)
encode (count consecutive elements)
decode (generate consecutive elements)
group (into sublists of even sizes)
My lame answer is that:
foldr is for reducing the problem to the primitive case and then assembling back up (behaves as a non tail-recursion)
foldl is for reducing the problem and assembling the solution at every step, where at the primitive case you have the solution ready (bahaves as a tail recursion / iteration)
This question reminded me immediately of a talk by Ralf Lämmel Going Bananas (as the rfold operator notation looks like a banana (| and |)). There are quite illustrative examples of mapping recursion to folds and even one fold to the other.
The classic paper (that is quite difficult at first) is Functional Programming with Bananas, Lenses,. Envelopes and Barbed Wire named after the look of other operators.

Folds versus recursion in Erlang

According to Learn you some Erlang :
Pretty much any function you can think of that reduces lists to 1 element can be expressed as a fold. [...]
This means fold is universal in the sense that you can implement pretty much any other recursive function on lists with a fold
My first thought when writing a function that takes a lists and reduces it to 1 element is to use recursion.
What are the guidelines that should help me decide whether to use recursion or a fold?
Is this a stylistic consideration or are there other factors as well (performance, readability, etc.)?
I personally prefer recursion over fold in Erlang (contrary to other languages e.g. Haskell). I don't see fold more readable than recursion. For example:
fsum(L) -> lists:foldl(fun(X,S) -> S+X end, 0, L).
or
fsum(L) ->
F = fun(X,S) -> S+X end,
lists:foldl(F, 0, L).
vs
rsum(L) -> rsum(L, 0).
rsum([], S) -> S;
rsum([H|T], S) -> rsum(T, H+S).
Seems more code but it is pretty straightforward and idiomatic Erlang. Using fold requires less code but the difference becomes smaller and smaller with more payload. Imagine we want a filter and map odd values to their square.
lcfoo(L) -> [ X*X || X<-L, X band 1 =:= 1].
fmfoo(L) ->
lists:map(fun(X) -> X*X end,
lists:filter(fun(X) when X band 1 =:= 1 -> true; (_) -> false end, L)).
ffoo(L) -> lists:foldr(
fun(X, A) when X band 1 =:= 1 -> [X|A];
(_, A) -> A end,
[], L).
rfoo([]) -> [];
rfoo([H|T]) when H band 1 =:= 1 -> [H*H | rfoo(T)];
rfoo([_|T]) -> rfoo(T).
Here list comprehension wins but recursive function is in the second place and fold version is ugly and less readable.
And finally, it is not true that fold is faster than recursive version especially when compiled to native (HiPE) code.
Edit:
I add a fold version with fun in variable as requested:
ffoo2(L) ->
F = fun(X, A) when X band 1 =:= 1 -> [X|A];
(_, A) -> A
end,
lists:foldr(F, [], L).
I don't see how it is more readable than rfoo/1 and I found especially an accumulator manipulation more complicated and less obvious than direct recursion. It is even longer code.
folds are usually both more readable (since everybody know what they do) and faster due to optimized implementations in the runtime (especially foldl which always should be tail recursive). It's worth noting that they are only a constant factor faster, not on another order, so it's usually premature optimization if you find yourself considering one over the other for performance reasons.
Use standard recursion when you do fancy things, such as working on more than one element at a time, splitting into multiple processes and similar, and stick to higher-order functions (fold, map, ...) when they already do what you want.
I expect fold is done recursively, so you may want to look at trying to implement some of the various list functions, such as map or filter, with fold, and see how useful it can be.
Otherwise, if you are doing this recursively you may be re-implementing fold, basically.
Learn to use what comes with the language, is my thought.
This discussion on foldl and recursion is interesting:
Easy way to break foldl
If you look at the first paragraph in this introduction (you may want to read all of it), he states better than I did.
http://www.cs.nott.ac.uk/~gmh/fold.pdf
Old thread but my experience is that fold works slower than a recursive function.

Why is foldl defined in a strange way in Racket?

In Haskell, like in many other functional languages, the function foldl is defined such that, for example, foldl (-) 0 [1,2,3,4] = -10.
This is OK, because foldl (-) 0 [1, 2,3,4] is, by definition, ((((0 - 1) - 2) - 3) - 4).
But, in Racket, (foldl - 0 '(1 2 3 4)) is 2, because Racket "intelligently" calculates like this: (4 - (3 - (2 - (1 - 0)))), which indeed is 2.
Of course, if we define auxiliary function flip, like this:
(define (flip bin-fn)
(lambda (x y)
(bin-fn y x)))
then we could in Racket achieve the same behavior as in Haskell: instead of (foldl - 0 '(1 2 3 4)) we can write: (foldl (flip -) 0 '(1 2 3 4))
The question is: Why is foldl in racket defined in such an odd (nonstandard and nonintuitive) way, differently than in any other language?
The Haskell definition is not uniform. In Racket, the function to both folds have the same order of inputs, and therefore you can just replace foldl by foldr and get the same result. If you do that with the Haskell version you'd get a different result (usually) — and you can see this in the different types of the two.
(In fact, I think that in order to do a proper comparison you should avoid these toy numeric examples where both of the type variables are integers.)
This has the nice byproduct where you're encouraged to choose either foldl or foldr according to their semantic differences. My guess is that with Haskell's order you're likely to choose according to the operation. You have a good example for this: you've used foldl because you want to subtract each number — and that's such an "obvious" choice that it's easy to overlook the fact that foldl is usually a bad choice in a lazy language.
Another difference is that the Haskell version is more limited than the Racket version in the usual way: it operates on exactly one input list, whereas Racket can accept any number of lists. This makes it more important to have a uniform argument order for the input function).
Finally, it is wrong to assume that Racket diverged from "many other functional languages", since folding is far from a new trick, and Racket has roots that are far older than Haskell (or these other languages). The question could therefore go the other way: why is Haskell's foldl defined in a strange way? (And no, (-) is not a good excuse.)
Historical update:
Since this seems to bother people again and again, I did a little bit of legwork. This is not definitive in any way, just my second-hand guessing. Feel free to edit this if you know more, or even better, email the relevant people and ask. Specifically, I don't know the dates where these decisions were made, so the following list is in rough order.
First there was Lisp, and no mention of "fold"ing of any kind. Instead, Lisp has reduce which is very non-uniform, especially if you consider its type. For example, :from-end is a keyword argument that determines whether it's a left or a right scan and it uses different accumulator functions which means that the accumulator type depends on that keyword. This is in addition to other hacks: usually the first value is taken from the list (unless you specify an :initial-value). Finally, if you don't specify an :initial-value, and the list is empty, it will actually apply the function on zero arguments to get a result.
All of this means that reduce is usually used for what its name suggests: reducing a list of values into a single value, where the two types are usually the same. The conclusion here is that it's serving a kind of a similar purpose to folding, but it's not nearly as useful as the generic list iteration construct that you get with folding. I'm guessing that this means that there's no strong relation between reduce and the later fold operations.
The first relevant language that follows Lisp and has a proper fold is ML. The choice that was made there, as noted in newacct's answer below, was to go with the uniform types version (ie, what Racket uses).
The next reference is Bird & Wadler's ItFP (1988), which uses different types (as in Haskell). However, they note in the appendix that Miranda has the same type (as in Racket).
Miranda later on switched the argument order (ie, moved from the Racket order to the Haskell one). Specifically, that text says:
WARNING - this definition of foldl differs from that in older versions of Miranda. The one here is the same as that in Bird and Wadler (1988). The old definition had the two args of `op' reversed.
Haskell took a lot of stuff from Miranda, including the different types. (But of course I don't know the dates so maybe the Miranda change was due to Haskell.) In any case, it's clear at this point that there was no consensus, hence the reversed question above holds.
OCaml went with the Haskell direction and uses different types
I'm guessing that "How to Design Programs" (aka HtDP) was written at roughly the same period, and they chose the same type. There is, however, no motivation or explanation — and in fact, after that exercise it's simply mentioned as one of the built-in functions.
Racket's implementation of the fold operations was, of course, the "built-ins" that are mentioned here.
Then came SRFI-1, and the choice was to use the same-type version (as Racket). This decision was question by John David Stone, who points at a comment in the SRFI that says
Note: MIT Scheme and Haskell flip F's arg order for their reduce and fold functions.
Olin later addressed this: all he said was:
Good point, but I want consistency between the two functions.
state-value first: srfi-1, SML
state-value last: Haskell
Note in particular his use of state-value, which suggests a view where consistent types are a possibly more important point than operator order.
"differently than in any other language"
As a counter-example, Standard ML (ML is a very old and influential functional language)'s foldl also works this way: http://www.standardml.org/Basis/list.html#SIG:LIST.foldl:VAL
Racket's foldl and foldr (and also SRFI-1's fold and fold-right) have the property that
(foldr cons null lst) = lst
(foldl cons null lst) = (reverse lst)
I speculate the argument order was chosen for that reason.
From the Racket documentation, the description of foldl:
(foldl proc init lst ...+) → any/c
Two points of interest for your question are mentioned:
the input lsts are traversed from left to right
And
foldl processes the lsts in constant space
I'm gonna speculate on how the implementation for that might look like, with a single list for simplicity's sake:
(define (my-foldl proc init lst)
(define (iter lst acc)
(if (null? lst)
acc
(iter (cdr lst) (proc (car lst) acc))))
(iter lst init))
As you can see, the requirements of left-to-right traversal and constant space are met (notice the tail recursion in iter), but the order of the arguments for proc was never specified in the description. Hence, the result of calling the above code would be:
(my-foldl - 0 '(1 2 3 4))
> 2
If we had specified the order of the arguments for proc in this way:
(proc acc (car lst))
Then the result would be:
(my-foldl - 0 '(1 2 3 4))
> -10
My point is, the documentation for foldl doesn't make any assumptions on the evaluation order of the arguments for proc, it only has to guarantee that constant space is used and that the elements in the list are evaluated from left to right.
As a side note, you can get the desired evaluation order for your expression by simply writing this:
(- 0 1 2 3 4)
> -10

Resources