How does term-rewriting based evaluation work?

How does term-rewriting based evaluation work? - functional-programming

The Pure programming language is apparently based on term rewriting, instead of the lambda-calculus that traditionally underlies similar-looking languages.
...what qualitative, practical difference does this make? In fact, what is the difference in the way that it evaluates expressions?
The linked page provides a lot of examples of term rewriting being useful, but it doesn't actually describe what it does differently from function application, except that it has rather flexible pattern matching (and pattern matching as it appears in Haskell and ML is nice, but not fundamental to the evaluation strategy). Values are matched against the left side of a definition and substituted into the right side - isn't this just beta reduction?
The matching of patterns, and substitution into output expressions, superficially looks a bit like syntax-rules to me (or even the humble #define), but the main feature of that is obviously that it happens before rather than during evaluation, whereas Pure is fully dynamic and there is no obvious phase separation in its evaluation system (and in fact otherwise Lisp macro systems have always made a big noise about how they are not different from function application). Being able to manipulate symbolic expression values is cool'n'all, but also seems like an artifact of the dynamic type system rather than something core to the evaluation strategy (pretty sure you could overload operators in Scheme to work on symbolic values; in fact you can even do it in C++ with expression templates).
So what is the mechanical/operational difference between term rewriting (as used by Pure) and traditional function application, as the underlying model of evaluation, when substitution happens in both?

Term rewriting doesn't have to look anything like function application, but languages like Pure emphasise this style because a) beta-reduction is simple to define as a rewrite rule and b) functional programming is a well-understood paradigm.
A counter-example would be a blackboard or tuple-space paradigm, which term-rewriting is also well-suited for.
One practical difference between beta-reduction and full term-rewriting is that rewrite rules can operate on the definition of an expression, rather than just its value. This includes pattern-matching on reducible expressions:
-- Functional style
map f nil = nil
map f (cons x xs) = cons (f x) (map f xs)
-- Compose f and g before mapping, to prevent traversing xs twice
result = map (compose f g) xs
-- Term-rewriting style: spot double-maps before they're reduced
map f (map g xs) = map (compose f g) xs
map f nil = nil
map f (cons x xs) = cons (f x) (map f xs)
-- All double maps are now automatically fused
result = map f (map g xs)
Notice that we can do this with LISP macros (or C++ templates), since they are a term-rewriting system, but this style blurs LISP's crisp distinction between macros and functions.
CPP's #define isn't equivalent, since it's not safe or hygenic (sytactically-valid programs can become invalid after pre-processing).
We can also define ad-hoc clauses to existing functions as we need them, eg.
plus (times x y) (times x z) = times x (plus y z)
Another practical consideration is that rewrite rules must be confluent if we want deterministic results, ie. we get the same result regardless of which order we apply the rules in. No algorithm can check this for us (it's undecidable in general) and the search space is far too large for individual tests to tell us much. Instead we must convince ourselves that our system is confluent by some formal or informal proof; one way would be to follow systems which are already known to be confluent.
For example, beta-reduction is known to be confluent (via the Church-Rosser Theorem), so if we write all of our rules in the style of beta-reductions then we can be confident that our rules are confluent. Of course, that's exactly what functional programming languages do!

Related

What is so special about Monads?

A monad is a mathematical structure which is heavily used in (pure) functional programming, basically Haskell. However, there are many other mathematical structures available, like for example applicative functors, strong monads, or monoids. Some have more specific, some are more generic. Yet, monads are much more popular. Why is that?
One explanation I came up with, is that they are a sweet spot between genericity and specificity. This means monads capture enough assumptions about the data to apply the algorithms we typically use and the data we usually have fulfills the monadic laws.
Another explanation could be that Haskell provides syntax for monads (do-notation), but not for other structures, which means Haskell programmers (and thus functional programming researchers) are intuitively drawn towards monads, where a more generic or specific (efficient) function would work as well.

I suspect that the disproportionately large attention given to this one particular type class (Monad) over the many others is mainly a historical fluke. People often associate IO with Monad, although the two are independently useful ideas (as are list reversal and bananas). Because IO is magical (having an implementation but no denotation) and Monad is often associated with IO, it's easy to fall into magical thinking about Monad.
(Aside: it's questionable whether IO even is a monad. Do the monad laws hold? What do the laws even mean for IO, i.e., what does equality mean? Note the problematic association with the state monad.)

If a type m :: * -> * has a Monad instance, you get Turing-complete composition of functions with type a -> m b. This is a fantastically useful property. You get the ability to abstract various Turing-complete control flows away from specific meanings. It's a minimal composition pattern that supports abstracting any control flow for working with types that support it.
Compare this to Applicative, for instance. There, you get only composition patterns with computational power equivalent to a push-down automaton. Of course, it's true that more types support composition with more limited power. And it's true that when you limit the power available, you can do additional optimizations. These two reasons are why the Applicative class exists and is useful. But things that can be instances of Monad usually are, so that users of the type can perform the most general operations possible with the type.
Edit:
By popular demand, here are some functions using the Monad class:
ifM :: Monad m => m Bool -> m a -> m a -> m a
ifM c x y = c >>= \z -> if z then x else y
whileM :: Monad m => (a -> m Bool) -> (a -> m a) -> a -> m a
whileM p step x = ifM (p x) (step x >>= whileM p step) (return x)
(*&&) :: Monad m => m Bool -> m Bool -> m Bool
x *&& y = ifM x y (return False)
(*||) :: Monad m => m Bool -> m Bool -> m Bool
x *|| y = ifM x (return True) y
notM :: Monad m => m Bool -> m Bool
notM x = x >>= return . not
Combining those with do syntax (or the raw >>= operator) gives you name binding, indefinite looping, and complete boolean logic. That's a well-known set of primitives sufficient to give Turing completeness. Note how all the functions have been lifted to work on monadic values, rather than simple values. All monadic effects are bound only when necessary - only the effects from the chosen branch of ifM are bound into its final value. Both *&& and *|| ignore their second argument when possible. And so on..
Now, those type signatures may not involve functions for every monadic operand, but that's just a cognitive simplification. There would be no semantic difference, ignoring bottoms, if all the non-function arguments and results were changed to () -> m a. It's just friendlier to users to optimize that cognitive overhead out.
Now, let's look at what happens to those functions with the Applicative interface.
ifA :: Applicative f => f Bool -> f a -> f a -> f a
ifA c x y = (\c' x' y' -> if c' then x' else y') <$> c <*> x <*> y
Well, uh. It got the same type signature. But there's a really big problem here already. The effects of both x and y are bound into the composed structure, regardless of which one's value is selected.
whileA :: Applicative f => (a -> f Bool) -> (a -> f a) -> a -> f a
whileA p step x = ifA (p x) (whileA p step <$> step x) (pure x)
Well, ok, that seems like it'd be ok, except for the fact that it's an infinite loop because ifA will always execute both branches... Except it's not even that close. pure x has the type f a. whileA p step <$> step x has the type f (f a). This isn't even an infinite loop. It's a compile error. Let's try again..
whileA :: Applicative f => (a -> f Bool) -> (a -> f a) -> a -> f a
whileA p step x = ifA (p x) (whileA p step <*> step x) (pure x)
Well shoot. Don't even get that far. whileA p step has the type a -> f a. If you try to use it as the first argument to <*>, it grabs the Applicative instance for the top type constructor, which is (->), not f. Yeah, this isn't gonna work either.
In fact, the only function from my Monad examples that would work with the Applicative interface is notM. That particular function works just fine with only a Functor interface, in fact. The rest? They fail.
Of course it's to be expected that you can write code using the Monad interface that you can't with the Applicative interface. It is strictly more powerful, after all. But what's interesting is what you lose. You lose the ability to compose functions that change what effects they have based on their input. That is, you lose the ability to write certain control-flow patterns that compose functions with types a -> f b.
Turing-complete composition is exactly what makes the Monad interface interesting. If it didn't allow Turing-complete composition, it would be impossible for you, the programmer, to compose together IO actions in any particular control flow that wasn't nicely prepackaged for you. It was the fact that you can use the Monad primitives to express any control flow that made the IO type a feasible way to manage the IO problem in Haskell.
Many more types than just IO have semantically valid Monad interfaces. And it happens that Haskell has the language facilities to abstract over the entire interface. Due to those factors, Monad is a valuable class to provide instances for, when possible. Doing so gets you access to all the existing abstract functionality provided for working with monadic types, regardless of what the concrete type is.
So if Haskell programmers seem to always care about Monad instances for a type, it's because it's the most generically-useful instance that can be provided.

First, I think that it is not quite true that monads are much more popular than anything else; both Functor and Monoid have many instances that are not monads. But they are both very specific; Functor provides mapping, Monoid concatenation. Applicative is the one class that I can think of that is probably underused given its considerable power, due largely to its being a relatively recent addition to the language.
But yes, monads are extremely popular. Part of that is the do notation; a lot of Monoids provide Monad instances that merely append values to a running accumulator (essentially an implicit writer). The blaze-html library is a good example. The reason, I think, is the power of the type signature (>>=) :: Monad m => m a -> (a -> m b) -> m b. While fmap and mappend are useful, what they can do is fairly narrowly constrained. bind, however, can express a wide variety of things. It is, of course, canonized in the IO monad, perhaps the best pure functional approach to IO before streams and FRP (and still useful beside them for simple tasks and defining components). But it also provides implicit state (Reader/Writer/ST), which can avoid some very tedious variable passing. The various state monads, especially, are important because they provide a guarantee that state is single threaded, allowing mutable structures in pure (non-IO) code before fusion. But bind has some more exotic uses, such as flattening nested data structures (the List and Set monads), both of which are quite useful in their place (and I usually see them used desugared, calling liftM or (>>=) explicitly, so it is not a matter of do notation). So while Functor and Monoid (and the somewhat rarer Foldable, Alternative, Traversable, and others) provide a standardized interface to a fairly straightforward function, Monad's bind is considerably more flexibility.
In short, I think that all your reasons have some role; the popularity of monads is due to a combination of historical accident (do notation and the late definition of Applicative) and their combination of power and generality (relative to functors, monoids, and the like) and understandability (relative to arrows).

Well, first let me explain what the role of monads is: Monads are very powerful, but in a certain sense: You can pretty much express anything using a monad. Haskell as a language doesn't have things like action loops, exceptions, mutation, goto, etc. Monads can be expressed within the language (so they are not special) and make all of these reachable.
There is a positive and a negative side to this: It's positive that you can express all those control structures you know from imperative programming and a whole bunch of them you don't. I have just recently developed a monad that lets you reenter a computation somewhere in the middle with a slightly changed context. That way you can run a computation, and if it fails, you just try again with slightly adjusted values. Furthermore monadic actions are first class, and that's how you build things like loops or exception handling. While while is primitive in C in Haskell it's actually just a regular function.
The negative side is that monads give you pretty much no guarantees whatsoever. They are so powerful that you are allowed to do whatever you want, to put it simply. In other words just like you know from imperative languages it can be hard to reason about code by just looking at it.
The more general abstractions are more general in the sense that they allow some concepts to be expressed which you can't express as monads. But that's only part of the story. Even for monads you can use a style known as applicative style, in which you use the applicative interface to compose your program from small isolated parts. The benefit of this is that you can reason about code by just looking at it and you can develop components without having to pay attention to the rest of your system.

What is so special about monads?
The monadic interface's main claim to fame in Haskell is its role in the replacement of the original and unwieldy dialogue-based I/O mechanism.
As for their status in a formal investigative context...it is merely an iteration of a seemingly-cyclic endeavour which is now (2021 Oct) approximately one half-century old:
During the 1960s, several researchers began work on proving things about programs. Efforts were
made to prove that:
A program was correct.
Two programs with different code computed the same answers when given the
same inputs.
One program was faster than another.
A given program would always terminate.
While these are abstract goals, they are all, really, the same as the practical goal of "getting the
program debugged".
Several difficult problems emerged from this work. One was the problem of specification: before
one can prove that a program is correct, one must specify the meaning of "correct", formally and
unambiguously. Formal systems for specifying the meaning of a program were developed, and they
looked suspiciously like programming languages.
The Anatomy of Programming Languages, Alice E. Fischer and Frances S. Grodzinsky.
(emphasis by me.)
...back when "programming languages" - apart from an intrepid few - were most definitely imperative.
Anyone for elevating this mystery to the rank of Millenium problem? Solving it would definitely advance the science of computing and the engineering of software, one way or the other...

Monads are special because of do notation, which lets you write imperative programs in a functional language. Monad is the abstraction that allows you to splice together imperative programs from smaller, reusable components (which are themselves imperative programs). Monad transformers are special because they represent enhancing an imperative language with new features.

Fixed-Point Combinators

I am new to the world of fixed-point combinators and I guess they are used to recurse on anonymous lambdas, but I haven't really got to use them, or even been able to wrap my head around them completely.
I have seen the example in Javascript for a Y-combinator but haven't been able to successfully run it.
The question here is, can some one give an intuitive answer to:
What are Fixed-point combinators, (not just theoretically, but in context of some example, to reveal what exactly is the fixed-point in that context)?
What are the other kinds of fixed-point combinators, apart from the Y-combinator?
Bonus Points: If the example is not just in one language, preferably in Clojure as well.
UPDATE:
I have been able to find a simple example in Clojure, but still find it difficult to understand the Y-Combinator itself:
(defn Y [r]
((fn [f] (f f))
(fn [f]
(r (fn [x] ((f f) x))))))
Though the example is concise, I find it difficult to understand what is happening within the function. Any help provided would be useful.

Suppose you wanted to write the factorial function. Normally, you would write it as something like
function fact(n) = if n=0 then 1 else n * fact(n-1)
But that uses explicit recursion. If you wanted to use the Y-combinator instead, you could first abstract fact as something like
function factMaker(myFact) = lamba n. if n=0 then 1 else n * myFact(n-1)
This takes an argument (myFact) which it calls were the "true" fact would have called itself. I call this style of function "Y-ready", meaning it's ready to be fed to the Y-combinator.
The Y-combinator uses factMaker to build something equivalent to the "true" fact.
newFact = Y(factMaker)
Why bother? Two reasons. The first is theoretical: we don't really need recursion if we can "simulate" it using the Y-combinator.
The second is more pragmatic. Sometimes we want to wrap each function call with some extra code to do logging or profiling or memoization or a host of other things. If we try to do this to the "true" fact, the extra code will only be called for the original call to fact, not all the recursive calls. But if we want to do this for every call, including all the recursive call, we can do something like
loggingFact = LoggingY(factMaker)
where LoggingY is a modified version of the Y combinator that introduces logging. Notice that we did not need to change factMaker at all!
All this is more motivation why the Y-combinator matters than a detailed explanation from how that particular implementation of Y works (because there are many different ways to implement Y).

To answer your second question about fix-point combinators other than Y. There are countably infinitely many standard fix-point combinators, that is, combinators fix that satisfy the equation
fix f = f (fix f)
There are also contably many non-standard fix-point combinators, which satisfy the equation
fix f = f (f (fix f))
etc. Standard fix-point combinators are recursively enumerable, but non-standard are not. Please see the following web page for examples, references and discussion.
http://okmij.org/ftp/Computation/fixed-point-combinators.html#many-fixes

In pure functional languages, is data (strings, ints, floats.. ) also just functions?

I was thinking about pure Object Oriented Languages like Ruby, where everything, including numbers, int, floats, and strings are themselves objects. Is this the same thing with pure functional languages? For example, in Haskell, are Numbers and Strings also functions?
I know Haskell is based on lambda calculus which represents everything, including data and operations, as functions. It would seem logical to me that a "purely functional language" would model everything as a function, as well as keep with the definition that a function most always returns the same output with the same inputs and has no state.

It's okay to think about that theoretically, but...
Just like in Ruby not everything is an object (argument lists, for instance, are not objects), not everything in Haskell is a function.
For more reference, check out this neat post: http://conal.net/blog/posts/everything-is-a-function-in-haskell

#wrhall gives a good answer. However you are somewhat correct that in the pure lambda calculus it is consistent for everything to be a function, and the language is Turing-complete (capable of expressing any pure computation that Haskell, etc. is).
That gives you some very strange things, since the only thing you can do to anything is to apply it to something else. When do you ever get to observe something? You have some value f and want to know something about it, your only choice is to apply it some value x to get f x, which is another function and the only choice is to apply it to another value y, to get f x y and so on.
Often I interpret the pure lambda calculus as talking about transformations on things that are not functions, but only capable of expressing functions itself. That is, I can make a function (with a bit of Haskelly syntax sugar for recursion & let):
purePlus = \zero succ natCase ->
let plus = \m n -> natCase m n (\m' -> plus m' n)
in plus (succ (succ zero)) (succ (succ zero))
Here I have expressed the computation 2+2 without needing to know that there are such things as non-functions. I simply took what I needed as arguments to the function I was defining, and the values of those arguments could be church encodings or they could be "real" numbers (whatever that means) -- my definition does not care.
And you could think the same thing of Haskell. There is no particular reason to think that there are things which are not functions, nor is there a particular reason to think that everything is a function. But Haskell's type system at least prevents you from applying an argument to a number (anybody thinking about fromInteger right now needs to hold their tongue! :-). In the above interpretation, it is because numbers are not necessarily modeled as functions, so you can't necessarily apply arguments to them.
In case it isn't clear by now, this whole answer has been somewhat of a technical/philosophical digression, and the easy answer to your question is "no, not everything is a function in functional languages". Functions are the things you can apply arguments to, that's all.

The "pure" in "pure functional" refers to the "freedom from side effects" kind of purity. It has little relation to the meaning of "pure" being used when people talk about a "pure object-oriented language", which simply means that the language manipulates purely (only) in objects.
The reason is that pure-as-in-only is a reasonable distinction to use to classify object-oriented languages, because there are languages like Java and C++, which clearly have values that don't have all that much in common with objects, and there are also languages like Python and Ruby, for which it can be argued that every value is an object1
Whereas for functional languages, there are no practical languages which are "pure functional" in the sense that every value the language can manipulate is a function. It's certainly possible to program in such a language. The most basic versions of the lambda calculus don't have any notion of things that are not functions, but you can still do arbitrary computation with them by coming up with ways of representing the things you want to compute on as functions.2
But while the simplicity and minimalism of the lambda calculus tends to be great for proving things about programming, actually writing substantial programs in such a "raw" programming language is awkward. The function representation of basic things like numbers also tends to be very inefficient to implement on actual physical machines.
But there is a very important distinction between languages that encourage a functional style but allow untracked side effects anywhere, and ones that actually enforce that your functions are "pure" functions (similar to mathematical functions). Object-oriented programming is very strongly wed to the use of impure computations3, so there are no practical object-oriented programming languages that are pure in this sense.
So the "pure" in "pure functional language" means something very different from the "pure" in "pure object-oriented language".4 In each case the "pure vs not pure" distinction is one that is completely uninteresting applied to the other kind of language, so there's no very strong motive to standardise the use of the term.
1 There are corner cases to pick at in all "pure object-oriented" languages that I know of, but that's not really very interesting. It's clear that the object metaphor goes much further in languages in which 1 is an instance of some class, and that class can be sub-classed, than it does in languages in which 1 is something else than an object.
2 All computation is about representation anyway. Computers don't know anything about numbers or anything else. They just have bit-patterns that we use to represent numbers, and operations on bit-patterns that happen to correspond to operations on numbers (because we designed them so that they would).
3 This isn't fundamental either. You could design a "pure" object-oriented language that was pure in this sense. I tend to write most of my OO code to be pure anyway.
4 If this seems obtuse, you might reflect that the terms "functional", "object", and "language" have vastly different meanings in other contexts also.

A very different angle on this question: all sorts of data in Haskell can be represented as functions, using a technique called Church encodings. This is a form of inversion of control: instead of passing data to functions that consume it, you hide the data inside a set of closures, and to consume it you pass in callbacks describing what to do with this data.
Any program that uses lists, for example, can be translated into a program that uses functions instead of lists:
-- | A list corresponds to a function of this type:
type ChurchList a r = (a -> r -> r) --^ how to handle a cons cell
-> r --^ how to handle the empty list
-> r --^ result of processing the list
listToCPS :: [a] -> ChurchList a r
listToCPS xs = \f z -> foldr f z xs
That function is taking a concrete list as its starting point, but that's not necessary. You can build up ChurchList functions out of just pure functions:
-- | The empty 'ChurchList'.
nil :: ChurchList a r
nil = \f z -> z
-- | Add an element at the front of a 'ChurchList'.
cons :: a -> ChurchList a r -> ChurchList a r
cons x xs = \f z -> f z (xs f z)
foldChurchList :: (a -> r -> r) -> r -> ChurchList a r -> r
foldChurchList f z xs = xs f z
mapChurchList :: (a -> b) -> ChurchList a r -> ChurchList b r
mapChurchList f = foldChurchList step nil
where step x = cons (f x)
filterChurchList :: (a -> Bool) -> ChurchList a r -> ChurchList a r
filterChurchList pred = foldChurchList step nil
where step x xs = if pred x then cons x xs else xs
That last function uses Bool, but of course we can replace Bool with functions as well:
-- | A Bool can be represented as a function that chooses between two
-- given alternatives.
type ChurchBool r = r -> r -> r
true, false :: ChurchBool r
true a _ = a
false _ b = b
filterChurchList' :: (a -> ChurchBool r) -> ChurchList a r -> ChurchList a r
filterChurchList' pred = foldChurchList step nil
where step x xs = pred x (cons x xs) xs
This sort of transformation can be done for basically any type, so in theory, you could get rid of all "value" types in Haskell, and keep only the () type, the (->) and IO type constructors, return and >>= for IO, and a suitable set of IO primitives. This would obviously be hella impractical—and it would perform worse (try writing tailChurchList :: ChurchList a r -> ChurchList a r for a taste).

Is getChar :: IO Char a function or not? Haskell Report doesn't provide us with a definition. But it states that getChar is a function (see here). (Well, at least we can say that it is a function.)
So I think the answer is YES.
I don't think there can be correct definition of "function" except "everything is a function". (What is "correct definition"? Good question...) Consider the next example:
{-# LANGUAGE NoMonomorphismRestriction #-}
import Control.Applicative
f :: Applicative f => f Int
f = pure 1
g1 :: Maybe Int
g1 = f
g2 :: Int -> Int
g2 = f
Is f a function or datatype? It depends.

Why is foldl defined in a strange way in Racket?

In Haskell, like in many other functional languages, the function foldl is defined such that, for example, foldl (-) 0 [1,2,3,4] = -10.
This is OK, because foldl (-) 0 [1, 2,3,4] is, by definition, ((((0 - 1) - 2) - 3) - 4).
But, in Racket, (foldl - 0 '(1 2 3 4)) is 2, because Racket "intelligently" calculates like this: (4 - (3 - (2 - (1 - 0)))), which indeed is 2.
Of course, if we define auxiliary function flip, like this:
(define (flip bin-fn)
(lambda (x y)
(bin-fn y x)))
then we could in Racket achieve the same behavior as in Haskell: instead of (foldl - 0 '(1 2 3 4)) we can write: (foldl (flip -) 0 '(1 2 3 4))
The question is: Why is foldl in racket defined in such an odd (nonstandard and nonintuitive) way, differently than in any other language?

The Haskell definition is not uniform. In Racket, the function to both folds have the same order of inputs, and therefore you can just replace foldl by foldr and get the same result. If you do that with the Haskell version you'd get a different result (usually) — and you can see this in the different types of the two.
(In fact, I think that in order to do a proper comparison you should avoid these toy numeric examples where both of the type variables are integers.)
This has the nice byproduct where you're encouraged to choose either foldl or foldr according to their semantic differences. My guess is that with Haskell's order you're likely to choose according to the operation. You have a good example for this: you've used foldl because you want to subtract each number — and that's such an "obvious" choice that it's easy to overlook the fact that foldl is usually a bad choice in a lazy language.
Another difference is that the Haskell version is more limited than the Racket version in the usual way: it operates on exactly one input list, whereas Racket can accept any number of lists. This makes it more important to have a uniform argument order for the input function).
Finally, it is wrong to assume that Racket diverged from "many other functional languages", since folding is far from a new trick, and Racket has roots that are far older than Haskell (or these other languages). The question could therefore go the other way: why is Haskell's foldl defined in a strange way? (And no, (-) is not a good excuse.)
Historical update:
Since this seems to bother people again and again, I did a little bit of legwork. This is not definitive in any way, just my second-hand guessing. Feel free to edit this if you know more, or even better, email the relevant people and ask. Specifically, I don't know the dates where these decisions were made, so the following list is in rough order.
First there was Lisp, and no mention of "fold"ing of any kind. Instead, Lisp has reduce which is very non-uniform, especially if you consider its type. For example, :from-end is a keyword argument that determines whether it's a left or a right scan and it uses different accumulator functions which means that the accumulator type depends on that keyword. This is in addition to other hacks: usually the first value is taken from the list (unless you specify an :initial-value). Finally, if you don't specify an :initial-value, and the list is empty, it will actually apply the function on zero arguments to get a result.
All of this means that reduce is usually used for what its name suggests: reducing a list of values into a single value, where the two types are usually the same. The conclusion here is that it's serving a kind of a similar purpose to folding, but it's not nearly as useful as the generic list iteration construct that you get with folding. I'm guessing that this means that there's no strong relation between reduce and the later fold operations.
The first relevant language that follows Lisp and has a proper fold is ML. The choice that was made there, as noted in newacct's answer below, was to go with the uniform types version (ie, what Racket uses).
The next reference is Bird & Wadler's ItFP (1988), which uses different types (as in Haskell). However, they note in the appendix that Miranda has the same type (as in Racket).
Miranda later on switched the argument order (ie, moved from the Racket order to the Haskell one). Specifically, that text says:
WARNING - this definition of foldl differs from that in older versions of Miranda. The one here is the same as that in Bird and Wadler (1988). The old definition had the two args of `op' reversed.
Haskell took a lot of stuff from Miranda, including the different types. (But of course I don't know the dates so maybe the Miranda change was due to Haskell.) In any case, it's clear at this point that there was no consensus, hence the reversed question above holds.
OCaml went with the Haskell direction and uses different types
I'm guessing that "How to Design Programs" (aka HtDP) was written at roughly the same period, and they chose the same type. There is, however, no motivation or explanation — and in fact, after that exercise it's simply mentioned as one of the built-in functions.
Racket's implementation of the fold operations was, of course, the "built-ins" that are mentioned here.
Then came SRFI-1, and the choice was to use the same-type version (as Racket). This decision was question by John David Stone, who points at a comment in the SRFI that says
Note: MIT Scheme and Haskell flip F's arg order for their reduce and fold functions.
Olin later addressed this: all he said was:
Good point, but I want consistency between the two functions.
state-value first: srfi-1, SML
state-value last: Haskell
Note in particular his use of state-value, which suggests a view where consistent types are a possibly more important point than operator order.

"differently than in any other language"
As a counter-example, Standard ML (ML is a very old and influential functional language)'s foldl also works this way: http://www.standardml.org/Basis/list.html#SIG:LIST.foldl:VAL

Racket's foldl and foldr (and also SRFI-1's fold and fold-right) have the property that
(foldr cons null lst) = lst
(foldl cons null lst) = (reverse lst)
I speculate the argument order was chosen for that reason.

From the Racket documentation, the description of foldl:
(foldl proc init lst ...+) → any/c
Two points of interest for your question are mentioned:
the input lsts are traversed from left to right
And
foldl processes the lsts in constant space
I'm gonna speculate on how the implementation for that might look like, with a single list for simplicity's sake:
(define (my-foldl proc init lst)
(define (iter lst acc)
(if (null? lst)
acc
(iter (cdr lst) (proc (car lst) acc))))
(iter lst init))
As you can see, the requirements of left-to-right traversal and constant space are met (notice the tail recursion in iter), but the order of the arguments for proc was never specified in the description. Hence, the result of calling the above code would be:
(my-foldl - 0 '(1 2 3 4))
> 2
If we had specified the order of the arguments for proc in this way:
(proc acc (car lst))
Then the result would be:
(my-foldl - 0 '(1 2 3 4))
> -10
My point is, the documentation for foldl doesn't make any assumptions on the evaluation order of the arguments for proc, it only has to guarantee that constant space is used and that the elements in the list are evaluated from left to right.
As a side note, you can get the desired evaluation order for your expression by simply writing this:
(- 0 1 2 3 4)
> -10

What are best practices for including parameters such as an accumulator in functions?

I've been writing more Lisp code recently. In particular, recursive functions that take some data, and build a resulting data structure. Sometimes it seems I need to pass two or three pieces of information to the next invocation of the function, in addition to the user supplied data. Lets call these accumulators.
What is the best way to organize these interfaces to my code?
Currently, I do something like this:
(defun foo (user1 user2 &optional acc1 acc2 acc3)
;; do something
(foo user1 user2 (cons x acc1) (cons y acc2) (cons z acc3)))
This works as I'd like it to, but I'm concerned because I don't really need to present the &optional parameters to the programmer.
3 approaches I'm somewhat considering:
have a wrapper function that a user is encouraged to use that immediately invokes the extended definiton.
use labels internally within a function whose signature is concise.
just start using a loop and variables. However, I'd prefer not since I'd like to really wrap my head around recursion.
Thanks guys!

If you want to write idiomatic Common Lisp, I'd recommend the loop and variables for iteration. Recursion is cool, but it's only one tool of many for the Common Lisper. Besides, tail-call elimination is not guaranteed by the Common Lisp spec.
That said, I'd recommend the labels approach if you have a structure, a tree for example, that is unavoidably recursive and you can't get tail calls anyway. Optional arguments let your implementation details leak out to the caller.

Your impulse to shield implementation details from the user is a smart one, I think. I don't know common lisp, but in Scheme you do it by defining your helper function in the public function's lexical scope.
(define (fibonacci n)
(let fib-accum ((a 0)
(b 1)
(n n))
(if (< n 1)
a
(fib-accum b (+ a b) (- n 1)))))
The let expression defines a function and binds it to a name that's only visible within the let, then invokes the function.

I have used all the options you mention. All have their merits, so it boils down to personal preference.
I have arrived at using whatever I deem appropriate. If I think that leaving the &optional accumulators in the API might make sense for the user, I leave it in. For example, in a reduce-like function, the accumulator can be used by the user for providing a starting value. Otherwise, I'll often rewrite it as a loop, do, or iter (from the iterate library) form, if it makes sense to perceive it as such. Sometimes, the labels helper is also used.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex