In pure functional languages, is data (strings, ints, floats.. ) also just functions?

In pure functional languages, is data (strings, ints, floats.. ) also just functions? - functional-programming

I was thinking about pure Object Oriented Languages like Ruby, where everything, including numbers, int, floats, and strings are themselves objects. Is this the same thing with pure functional languages? For example, in Haskell, are Numbers and Strings also functions?
I know Haskell is based on lambda calculus which represents everything, including data and operations, as functions. It would seem logical to me that a "purely functional language" would model everything as a function, as well as keep with the definition that a function most always returns the same output with the same inputs and has no state.

It's okay to think about that theoretically, but...
Just like in Ruby not everything is an object (argument lists, for instance, are not objects), not everything in Haskell is a function.
For more reference, check out this neat post: http://conal.net/blog/posts/everything-is-a-function-in-haskell

#wrhall gives a good answer. However you are somewhat correct that in the pure lambda calculus it is consistent for everything to be a function, and the language is Turing-complete (capable of expressing any pure computation that Haskell, etc. is).
That gives you some very strange things, since the only thing you can do to anything is to apply it to something else. When do you ever get to observe something? You have some value f and want to know something about it, your only choice is to apply it some value x to get f x, which is another function and the only choice is to apply it to another value y, to get f x y and so on.
Often I interpret the pure lambda calculus as talking about transformations on things that are not functions, but only capable of expressing functions itself. That is, I can make a function (with a bit of Haskelly syntax sugar for recursion & let):
purePlus = \zero succ natCase ->
let plus = \m n -> natCase m n (\m' -> plus m' n)
in plus (succ (succ zero)) (succ (succ zero))
Here I have expressed the computation 2+2 without needing to know that there are such things as non-functions. I simply took what I needed as arguments to the function I was defining, and the values of those arguments could be church encodings or they could be "real" numbers (whatever that means) -- my definition does not care.
And you could think the same thing of Haskell. There is no particular reason to think that there are things which are not functions, nor is there a particular reason to think that everything is a function. But Haskell's type system at least prevents you from applying an argument to a number (anybody thinking about fromInteger right now needs to hold their tongue! :-). In the above interpretation, it is because numbers are not necessarily modeled as functions, so you can't necessarily apply arguments to them.
In case it isn't clear by now, this whole answer has been somewhat of a technical/philosophical digression, and the easy answer to your question is "no, not everything is a function in functional languages". Functions are the things you can apply arguments to, that's all.

The "pure" in "pure functional" refers to the "freedom from side effects" kind of purity. It has little relation to the meaning of "pure" being used when people talk about a "pure object-oriented language", which simply means that the language manipulates purely (only) in objects.
The reason is that pure-as-in-only is a reasonable distinction to use to classify object-oriented languages, because there are languages like Java and C++, which clearly have values that don't have all that much in common with objects, and there are also languages like Python and Ruby, for which it can be argued that every value is an object1
Whereas for functional languages, there are no practical languages which are "pure functional" in the sense that every value the language can manipulate is a function. It's certainly possible to program in such a language. The most basic versions of the lambda calculus don't have any notion of things that are not functions, but you can still do arbitrary computation with them by coming up with ways of representing the things you want to compute on as functions.2
But while the simplicity and minimalism of the lambda calculus tends to be great for proving things about programming, actually writing substantial programs in such a "raw" programming language is awkward. The function representation of basic things like numbers also tends to be very inefficient to implement on actual physical machines.
But there is a very important distinction between languages that encourage a functional style but allow untracked side effects anywhere, and ones that actually enforce that your functions are "pure" functions (similar to mathematical functions). Object-oriented programming is very strongly wed to the use of impure computations3, so there are no practical object-oriented programming languages that are pure in this sense.
So the "pure" in "pure functional language" means something very different from the "pure" in "pure object-oriented language".4 In each case the "pure vs not pure" distinction is one that is completely uninteresting applied to the other kind of language, so there's no very strong motive to standardise the use of the term.
1 There are corner cases to pick at in all "pure object-oriented" languages that I know of, but that's not really very interesting. It's clear that the object metaphor goes much further in languages in which 1 is an instance of some class, and that class can be sub-classed, than it does in languages in which 1 is something else than an object.
2 All computation is about representation anyway. Computers don't know anything about numbers or anything else. They just have bit-patterns that we use to represent numbers, and operations on bit-patterns that happen to correspond to operations on numbers (because we designed them so that they would).
3 This isn't fundamental either. You could design a "pure" object-oriented language that was pure in this sense. I tend to write most of my OO code to be pure anyway.
4 If this seems obtuse, you might reflect that the terms "functional", "object", and "language" have vastly different meanings in other contexts also.

A very different angle on this question: all sorts of data in Haskell can be represented as functions, using a technique called Church encodings. This is a form of inversion of control: instead of passing data to functions that consume it, you hide the data inside a set of closures, and to consume it you pass in callbacks describing what to do with this data.
Any program that uses lists, for example, can be translated into a program that uses functions instead of lists:
-- | A list corresponds to a function of this type:
type ChurchList a r = (a -> r -> r) --^ how to handle a cons cell
-> r --^ how to handle the empty list
-> r --^ result of processing the list
listToCPS :: [a] -> ChurchList a r
listToCPS xs = \f z -> foldr f z xs
That function is taking a concrete list as its starting point, but that's not necessary. You can build up ChurchList functions out of just pure functions:
-- | The empty 'ChurchList'.
nil :: ChurchList a r
nil = \f z -> z
-- | Add an element at the front of a 'ChurchList'.
cons :: a -> ChurchList a r -> ChurchList a r
cons x xs = \f z -> f z (xs f z)
foldChurchList :: (a -> r -> r) -> r -> ChurchList a r -> r
foldChurchList f z xs = xs f z
mapChurchList :: (a -> b) -> ChurchList a r -> ChurchList b r
mapChurchList f = foldChurchList step nil
where step x = cons (f x)
filterChurchList :: (a -> Bool) -> ChurchList a r -> ChurchList a r
filterChurchList pred = foldChurchList step nil
where step x xs = if pred x then cons x xs else xs
That last function uses Bool, but of course we can replace Bool with functions as well:
-- | A Bool can be represented as a function that chooses between two
-- given alternatives.
type ChurchBool r = r -> r -> r
true, false :: ChurchBool r
true a _ = a
false _ b = b
filterChurchList' :: (a -> ChurchBool r) -> ChurchList a r -> ChurchList a r
filterChurchList' pred = foldChurchList step nil
where step x xs = pred x (cons x xs) xs
This sort of transformation can be done for basically any type, so in theory, you could get rid of all "value" types in Haskell, and keep only the () type, the (->) and IO type constructors, return and >>= for IO, and a suitable set of IO primitives. This would obviously be hella impractical—and it would perform worse (try writing tailChurchList :: ChurchList a r -> ChurchList a r for a taste).

Is getChar :: IO Char a function or not? Haskell Report doesn't provide us with a definition. But it states that getChar is a function (see here). (Well, at least we can say that it is a function.)
So I think the answer is YES.
I don't think there can be correct definition of "function" except "everything is a function". (What is "correct definition"? Good question...) Consider the next example:
{-# LANGUAGE NoMonomorphismRestriction #-}
import Control.Applicative
f :: Applicative f => f Int
f = pure 1
g1 :: Maybe Int
g1 = f
g2 :: Int -> Int
g2 = f
Is f a function or datatype? It depends.

Related

What are real use cases of currying?

I've been reading lots of articles on currying, but almost all of them are misleading, explaining currying as a partial function application and all of examples usually are about functions with arity of 2, like add function or something.
Also many implementations of curry function in JavaScript makes it to accept more than 1 argument per partial application (see lodash), when Wikipedia article clearly tells that currying is about:
translating the evaluation of a function that takes multiple arguments (or a tuple of arguments) into evaluating a sequence of functions, each with a single argument (partial application)
So basically currying is a series of partial applications each with a single argument. And I really want to know real uses of that, in any language.

Real use case of currying is partial application.
Currying by itself is not terribly interesting. What's interesting is if your programming language supports currying by default, as is the case in F# or Haskell.
You can define higher order functions for currying and partial application in any language that supports first class functions, but it's a far cry from the flexibility you get when every function you get is curried, and thus partially applicable without you having to do anything.
So if you see people conflating currying and partial application, that's because of how closely those concepts are tied there - since currying is ubiquitous, you don't really need other forms of partial application than applying curried functions to consecutive arguments.

It is usefull to pass context.
Consider the 'map' function. It takes a function as argument:
map : (a -> b) -> [a] -> [b]
Given a function which uses some form of context:
f : SomeContext -> a -> b
This means you can elegantly use the map function without having to state the 'a'-argument:
map (f actualContext) [1,2,3]
Without currying, you would have to use a lambda:
map (\a -> f actualContext a) [1,2,3]
Notes:
map is a function which takes a list containing values of a, a function f. It constructs a new list, by taking each a and applying f to it, resulting in a list of b
e.g. map (+1) [1,2,3] = [2,3,4]

The bearing currying has on code can be divided into two sets of issues (I use Haskell to illustrate).
Syntactical, Implementation.
Syntax Issue 1:
Currying allows greater code clarity in certain cases.
What does clarity mean? Reading the function provides clear indication of its functionality.
e.g. The map function.
map : (a -> b) -> ([a] -> [b])
Read in this way, we see that map is a higher order function that lifts a function transforming as to bs to a function that transforms [a] to [b].
This intuition is particularly useful when understanding such expressions.
map (map (+1))
The inner map has the type above [a] -> [b].
In order to figure out the type of the outer map, we recursively apply our intuition from above. The outer map thus lifts [a] -> [b] to [[a]] -> [[b]].
This intuition will carry you forward a LOT.
Once we generalize map over into fmap, a map over arbitrary containers, it becomes really easy to read expressions like so (Note I've monomorphised the type of each fmap to a different type for the sake of the example).
showInt : Int -> String
(fmap . fmap . fmap) showInt : Tree (Set [Int]) -> Tree (Set [String])
Hopefully the above illustrates that fmap provides this generalized notion of lifting vanilla functions into functions over some arbitrary container.
Syntax Issue 2:
Currying also allows us to express functions in point-free form.
nthSmallest : Int -> [Int] -> Maybe Int
nthSmallest n = safeHead . drop n . sort
safeHead (x:_) = Just x
safeHead _ = Nothing
The above is usually considered good style as it illustrates thinking in terms of a pipeline of functions rather than the explicit manipulation of data.
Implementation:
In Haskell, point free style (through currying) can help us optimize functions. Writing a function in point free form will allow us to memoize it.
memoized_fib :: Int -> Integer
memoized_fib = (map fib [0 ..] !!)
where fib 0 = 0
fib 1 = 1
fib n = memoized_fib (n-2) + memoized_fib (n-1)
not_memoized_fib :: Int -> Integer
not_memoized_fib x = map fib [0 ..] !! x
where fib 0 = 0
fib 1 = 1
fib n = not_memoized_fib (n-2) + not_memoized_fib (n-1)
Writing it as a curried function as in the memoized version treats the curried function as an entity and therefore memoizes it.

How does term-rewriting based evaluation work?

The Pure programming language is apparently based on term rewriting, instead of the lambda-calculus that traditionally underlies similar-looking languages.
...what qualitative, practical difference does this make? In fact, what is the difference in the way that it evaluates expressions?
The linked page provides a lot of examples of term rewriting being useful, but it doesn't actually describe what it does differently from function application, except that it has rather flexible pattern matching (and pattern matching as it appears in Haskell and ML is nice, but not fundamental to the evaluation strategy). Values are matched against the left side of a definition and substituted into the right side - isn't this just beta reduction?
The matching of patterns, and substitution into output expressions, superficially looks a bit like syntax-rules to me (or even the humble #define), but the main feature of that is obviously that it happens before rather than during evaluation, whereas Pure is fully dynamic and there is no obvious phase separation in its evaluation system (and in fact otherwise Lisp macro systems have always made a big noise about how they are not different from function application). Being able to manipulate symbolic expression values is cool'n'all, but also seems like an artifact of the dynamic type system rather than something core to the evaluation strategy (pretty sure you could overload operators in Scheme to work on symbolic values; in fact you can even do it in C++ with expression templates).
So what is the mechanical/operational difference between term rewriting (as used by Pure) and traditional function application, as the underlying model of evaluation, when substitution happens in both?

Term rewriting doesn't have to look anything like function application, but languages like Pure emphasise this style because a) beta-reduction is simple to define as a rewrite rule and b) functional programming is a well-understood paradigm.
A counter-example would be a blackboard or tuple-space paradigm, which term-rewriting is also well-suited for.
One practical difference between beta-reduction and full term-rewriting is that rewrite rules can operate on the definition of an expression, rather than just its value. This includes pattern-matching on reducible expressions:
-- Functional style
map f nil = nil
map f (cons x xs) = cons (f x) (map f xs)
-- Compose f and g before mapping, to prevent traversing xs twice
result = map (compose f g) xs
-- Term-rewriting style: spot double-maps before they're reduced
map f (map g xs) = map (compose f g) xs
map f nil = nil
map f (cons x xs) = cons (f x) (map f xs)
-- All double maps are now automatically fused
result = map f (map g xs)
Notice that we can do this with LISP macros (or C++ templates), since they are a term-rewriting system, but this style blurs LISP's crisp distinction between macros and functions.
CPP's #define isn't equivalent, since it's not safe or hygenic (sytactically-valid programs can become invalid after pre-processing).
We can also define ad-hoc clauses to existing functions as we need them, eg.
plus (times x y) (times x z) = times x (plus y z)
Another practical consideration is that rewrite rules must be confluent if we want deterministic results, ie. we get the same result regardless of which order we apply the rules in. No algorithm can check this for us (it's undecidable in general) and the search space is far too large for individual tests to tell us much. Instead we must convince ourselves that our system is confluent by some formal or informal proof; one way would be to follow systems which are already known to be confluent.
For example, beta-reduction is known to be confluent (via the Church-Rosser Theorem), so if we write all of our rules in the style of beta-reductions then we can be confident that our rules are confluent. Of course, that's exactly what functional programming languages do!

What is so special about Monads?

A monad is a mathematical structure which is heavily used in (pure) functional programming, basically Haskell. However, there are many other mathematical structures available, like for example applicative functors, strong monads, or monoids. Some have more specific, some are more generic. Yet, monads are much more popular. Why is that?
One explanation I came up with, is that they are a sweet spot between genericity and specificity. This means monads capture enough assumptions about the data to apply the algorithms we typically use and the data we usually have fulfills the monadic laws.
Another explanation could be that Haskell provides syntax for monads (do-notation), but not for other structures, which means Haskell programmers (and thus functional programming researchers) are intuitively drawn towards monads, where a more generic or specific (efficient) function would work as well.

I suspect that the disproportionately large attention given to this one particular type class (Monad) over the many others is mainly a historical fluke. People often associate IO with Monad, although the two are independently useful ideas (as are list reversal and bananas). Because IO is magical (having an implementation but no denotation) and Monad is often associated with IO, it's easy to fall into magical thinking about Monad.
(Aside: it's questionable whether IO even is a monad. Do the monad laws hold? What do the laws even mean for IO, i.e., what does equality mean? Note the problematic association with the state monad.)

If a type m :: * -> * has a Monad instance, you get Turing-complete composition of functions with type a -> m b. This is a fantastically useful property. You get the ability to abstract various Turing-complete control flows away from specific meanings. It's a minimal composition pattern that supports abstracting any control flow for working with types that support it.
Compare this to Applicative, for instance. There, you get only composition patterns with computational power equivalent to a push-down automaton. Of course, it's true that more types support composition with more limited power. And it's true that when you limit the power available, you can do additional optimizations. These two reasons are why the Applicative class exists and is useful. But things that can be instances of Monad usually are, so that users of the type can perform the most general operations possible with the type.
Edit:
By popular demand, here are some functions using the Monad class:
ifM :: Monad m => m Bool -> m a -> m a -> m a
ifM c x y = c >>= \z -> if z then x else y
whileM :: Monad m => (a -> m Bool) -> (a -> m a) -> a -> m a
whileM p step x = ifM (p x) (step x >>= whileM p step) (return x)
(*&&) :: Monad m => m Bool -> m Bool -> m Bool
x *&& y = ifM x y (return False)
(*||) :: Monad m => m Bool -> m Bool -> m Bool
x *|| y = ifM x (return True) y
notM :: Monad m => m Bool -> m Bool
notM x = x >>= return . not
Combining those with do syntax (or the raw >>= operator) gives you name binding, indefinite looping, and complete boolean logic. That's a well-known set of primitives sufficient to give Turing completeness. Note how all the functions have been lifted to work on monadic values, rather than simple values. All monadic effects are bound only when necessary - only the effects from the chosen branch of ifM are bound into its final value. Both *&& and *|| ignore their second argument when possible. And so on..
Now, those type signatures may not involve functions for every monadic operand, but that's just a cognitive simplification. There would be no semantic difference, ignoring bottoms, if all the non-function arguments and results were changed to () -> m a. It's just friendlier to users to optimize that cognitive overhead out.
Now, let's look at what happens to those functions with the Applicative interface.
ifA :: Applicative f => f Bool -> f a -> f a -> f a
ifA c x y = (\c' x' y' -> if c' then x' else y') <$> c <*> x <*> y
Well, uh. It got the same type signature. But there's a really big problem here already. The effects of both x and y are bound into the composed structure, regardless of which one's value is selected.
whileA :: Applicative f => (a -> f Bool) -> (a -> f a) -> a -> f a
whileA p step x = ifA (p x) (whileA p step <$> step x) (pure x)
Well, ok, that seems like it'd be ok, except for the fact that it's an infinite loop because ifA will always execute both branches... Except it's not even that close. pure x has the type f a. whileA p step <$> step x has the type f (f a). This isn't even an infinite loop. It's a compile error. Let's try again..
whileA :: Applicative f => (a -> f Bool) -> (a -> f a) -> a -> f a
whileA p step x = ifA (p x) (whileA p step <*> step x) (pure x)
Well shoot. Don't even get that far. whileA p step has the type a -> f a. If you try to use it as the first argument to <*>, it grabs the Applicative instance for the top type constructor, which is (->), not f. Yeah, this isn't gonna work either.
In fact, the only function from my Monad examples that would work with the Applicative interface is notM. That particular function works just fine with only a Functor interface, in fact. The rest? They fail.
Of course it's to be expected that you can write code using the Monad interface that you can't with the Applicative interface. It is strictly more powerful, after all. But what's interesting is what you lose. You lose the ability to compose functions that change what effects they have based on their input. That is, you lose the ability to write certain control-flow patterns that compose functions with types a -> f b.
Turing-complete composition is exactly what makes the Monad interface interesting. If it didn't allow Turing-complete composition, it would be impossible for you, the programmer, to compose together IO actions in any particular control flow that wasn't nicely prepackaged for you. It was the fact that you can use the Monad primitives to express any control flow that made the IO type a feasible way to manage the IO problem in Haskell.
Many more types than just IO have semantically valid Monad interfaces. And it happens that Haskell has the language facilities to abstract over the entire interface. Due to those factors, Monad is a valuable class to provide instances for, when possible. Doing so gets you access to all the existing abstract functionality provided for working with monadic types, regardless of what the concrete type is.
So if Haskell programmers seem to always care about Monad instances for a type, it's because it's the most generically-useful instance that can be provided.

First, I think that it is not quite true that monads are much more popular than anything else; both Functor and Monoid have many instances that are not monads. But they are both very specific; Functor provides mapping, Monoid concatenation. Applicative is the one class that I can think of that is probably underused given its considerable power, due largely to its being a relatively recent addition to the language.
But yes, monads are extremely popular. Part of that is the do notation; a lot of Monoids provide Monad instances that merely append values to a running accumulator (essentially an implicit writer). The blaze-html library is a good example. The reason, I think, is the power of the type signature (>>=) :: Monad m => m a -> (a -> m b) -> m b. While fmap and mappend are useful, what they can do is fairly narrowly constrained. bind, however, can express a wide variety of things. It is, of course, canonized in the IO monad, perhaps the best pure functional approach to IO before streams and FRP (and still useful beside them for simple tasks and defining components). But it also provides implicit state (Reader/Writer/ST), which can avoid some very tedious variable passing. The various state monads, especially, are important because they provide a guarantee that state is single threaded, allowing mutable structures in pure (non-IO) code before fusion. But bind has some more exotic uses, such as flattening nested data structures (the List and Set monads), both of which are quite useful in their place (and I usually see them used desugared, calling liftM or (>>=) explicitly, so it is not a matter of do notation). So while Functor and Monoid (and the somewhat rarer Foldable, Alternative, Traversable, and others) provide a standardized interface to a fairly straightforward function, Monad's bind is considerably more flexibility.
In short, I think that all your reasons have some role; the popularity of monads is due to a combination of historical accident (do notation and the late definition of Applicative) and their combination of power and generality (relative to functors, monoids, and the like) and understandability (relative to arrows).

Well, first let me explain what the role of monads is: Monads are very powerful, but in a certain sense: You can pretty much express anything using a monad. Haskell as a language doesn't have things like action loops, exceptions, mutation, goto, etc. Monads can be expressed within the language (so they are not special) and make all of these reachable.
There is a positive and a negative side to this: It's positive that you can express all those control structures you know from imperative programming and a whole bunch of them you don't. I have just recently developed a monad that lets you reenter a computation somewhere in the middle with a slightly changed context. That way you can run a computation, and if it fails, you just try again with slightly adjusted values. Furthermore monadic actions are first class, and that's how you build things like loops or exception handling. While while is primitive in C in Haskell it's actually just a regular function.
The negative side is that monads give you pretty much no guarantees whatsoever. They are so powerful that you are allowed to do whatever you want, to put it simply. In other words just like you know from imperative languages it can be hard to reason about code by just looking at it.
The more general abstractions are more general in the sense that they allow some concepts to be expressed which you can't express as monads. But that's only part of the story. Even for monads you can use a style known as applicative style, in which you use the applicative interface to compose your program from small isolated parts. The benefit of this is that you can reason about code by just looking at it and you can develop components without having to pay attention to the rest of your system.

What is so special about monads?
The monadic interface's main claim to fame in Haskell is its role in the replacement of the original and unwieldy dialogue-based I/O mechanism.
As for their status in a formal investigative context...it is merely an iteration of a seemingly-cyclic endeavour which is now (2021 Oct) approximately one half-century old:
During the 1960s, several researchers began work on proving things about programs. Efforts were
made to prove that:
A program was correct.
Two programs with different code computed the same answers when given the
same inputs.
One program was faster than another.
A given program would always terminate.
While these are abstract goals, they are all, really, the same as the practical goal of "getting the
program debugged".
Several difficult problems emerged from this work. One was the problem of specification: before
one can prove that a program is correct, one must specify the meaning of "correct", formally and
unambiguously. Formal systems for specifying the meaning of a program were developed, and they
looked suspiciously like programming languages.
The Anatomy of Programming Languages, Alice E. Fischer and Frances S. Grodzinsky.
(emphasis by me.)
...back when "programming languages" - apart from an intrepid few - were most definitely imperative.
Anyone for elevating this mystery to the rank of Millenium problem? Solving it would definitely advance the science of computing and the engineering of software, one way or the other...

Monads are special because of do notation, which lets you write imperative programs in a functional language. Monad is the abstraction that allows you to splice together imperative programs from smaller, reusable components (which are themselves imperative programs). Monad transformers are special because they represent enhancing an imperative language with new features.

How do I implement graphs and graph algorithms in a functional programming language?

Basically, I know how to create graph data structures and use Dijkstra's algorithm in programming languages where side effects are allowed. Typically, graph algorithms use a structure to mark certain nodes as 'visited', but this has side effects, which I'm trying to avoid.
I can think of one way to implement this in a functional language, but it basically requires passing around large amounts of state to different functions, and I'm wondering if there is a more space-efficient solution.

You might check out how Martin Erwig's Haskell functional graph library does things. For instance, its shortest-path functions are all pure, and you can see the source code for how it's implemented.
Another option, like fmark mentioned, is to use an abstraction which allows you to implement pure functions in terms of state. He mentions the State monad (which is available in both lazy and strict varieties). Another option, if you're working in the GHC Haskell compiler/interpreter (or, I think, any Haskell implementation which supports rank-2 types), another option is the ST monad, which allows you to write pure functions which deal with mutable variables internally.

If you were using haskell, the only functional language with which I am familiar, I would recommend using the State monad. The State monad is an abstraction for a function that takes a state and returns an intermediate value and some new state value. This is considered idiomatic haskell for those situations where maintaining a large state is necessary.
It is a much nicer alternative to the naive "return state as a function result and pass it as a parameter" idiom that is emphasized in beginner functional programming tutorials. I imagine most functional programming languages have a similar construct.

I just keep the visited set as a set and pass it as a parameter. There are efficient log-time implementations of sets of any ordered type and extra-efficient sets of integers.
To represent a graph I use adjacency lists, or I'll use a finite map that maps each node to a list of its successors. It depends what I want to do.
Rather than Abelson and Sussman, I recommend Chris Okasaki's Purely Functional Data Structures. I've linked to Chris's dissertation, but if you have the money, he expanded it into an excellent book.
Just for grins, here's a slightly scary reverse postorder depth-first search done in continuation-passing style in Haskell. This is straight out of the Hoopl optimizer library:
postorder_dfs_from_except :: forall block e . (NonLocal block, LabelsPtr e)
=> LabelMap (block C C) -> e -> LabelSet -> [block C C]
postorder_dfs_from_except blocks b visited =
vchildren (get_children b) (\acc _visited -> acc) [] visited
where
vnode :: block C C -> ([block C C] -> LabelSet -> a)
-> ([block C C] -> LabelSet -> a)
vnode block cont acc visited =
if setMember id visited then
cont acc visited
else
let cont' acc visited = cont (block:acc) visited in
vchildren (get_children block) cont' acc (setInsert id visited)
where id = entryLabel block
vchildren bs cont acc visited = next bs acc visited
where next children acc visited =
case children of [] -> cont acc visited
(b:bs) -> vnode b (next bs) acc visited
get_children block = foldr add_id [] $ targetLabels bloc
add_id id rst = case lookupFact id blocks of
Just b -> b : rst
Nothing -> rst

Here is a Swift example. You might find this a bit more readable. The variables are actually descriptively named, unlike the super cryptic Haskell examples.
https://github.com/gistya/Functional-Swift-Graph

Most functional languages support inner functions. So you can just create your graph representation in the outermost layer and just reference it from the inner function.
This book covers it extensively http://www.amazon.com/gp/product/0262510871/ref=pd_lpo_k2_dp_sr_1?ie=UTF8&cloe_id=aa7c71b1-f0f7-4fca-8003-525e801b8d46&attrMsgId=LPWidget-A1&pf_rd_p=486539851&pf_rd_s=lpo-top-stripe-1&pf_rd_t=201&pf_rd_i=0262011530&pf_rd_m=ATVPDKIKX0DER&pf_rd_r=114DJE8K5BG75B86E1QS

I would love to hear about some really clever technique, but I think there are two fundamental approaches:
Modify some global state object. i.e. side-effects
Pass the graph as an argument to your functions with the return value being the modified graph. I assume this is your approach of "passing around large amounts of state"
That is what's done in functional programming. If the compiler/interpreter is any good, it will help manage memory for you. In particular, you'll want to make sure that you use tail recursion, if you happen to recurse in any of your functions.

Functional Programming for Basic Algorithms

How good is 'pure' functional programming for basic routine implementations, e.g. list sorting, string matching etc.?
It's common to implement such basic functions within the base interpreter of any functional language, which means that they will be written in an imperative language (c/c++). Although there are many exceptions..
At least, I wish to ask: How difficult is it to emulate imperative style while coding in 'pure' functional language?

How good is 'pure' functional
programming for basic routine
implementations, e.g. list sorting,
string matching etc.?
Very. I'll do your problems in Haskell, and I'll be slightly verbose about it. My aim is not to convince you that the problem can be done in 5 characters (it probably can in J!), but rather to give you an idea of the constructs.
import Data.List -- for `sort`
stdlistsorter :: (Ord a) => [a] -> [a]
stdlistsorter list = sort list
Sorting a list using the sort function from Data.List
import Data.List -- for `delete`
selectionsort :: (Ord a) => [a] -> [a]
selectionsort [] = []
selectionsort list = minimum list : (selectionsort . delete (minimum list) $ list)
Selection sort implementation.
quicksort :: (Ord a) => [a] -> [a]
quicksort [] = []
quicksort (x:xs) =
let smallerSorted = quicksort [a | a <- xs, a <= x]
biggerSorted = quicksort [a | a <- xs, a > x]
in smallerSorted ++ [x] ++ biggerSorted
Quick sort implementation.
import Data.List -- for `isInfixOf`
stdstringmatch :: (Eq a) => [a] -> [a] -> Bool
stdstringmatch list1 list2 = list1 `isInfixOf` list2
String matching using isInfixOf function from Data.list
It's common to implement such basic
functions within the base interpreter
of any functional language, which
means that they will be written in an
imperative language (c/c++). Although
there are many exceptions..
Depends. Some functions are more naturally expressed imperatively. However, I hope I have convinced you that some algorithms are also expressed naturally in a functional way.
At least, I wish to ask: How difficult
is it to emulate imperative style
while coding in 'pure' functional
language?
It depends on how hard you find Monads in Haskell. Personally, I find it quite difficult to grasp.

1) Good by what standard? What properties do you desire?
List sorting? Easy. Let's do Quicksort in Haskell:
sort [] = []
sort (x:xs) = sort (filter (< x) xs) ++ [x] ++ sort (filter (>= x) xs)
This code has the advantage of being extremely easy to understand. If the list is empty, it's sorted. Otherwise, call the first element x, find elements less than x and sort them, find elements greater than x and sort those. Then concatenate the sorted lists with x in the middle. Try making that look comprehensible in C++.
Of course, Mergesort is much faster for sorting linked lists, but the code is also 6 times longer.
2) It's extremely easy to implement imperative style while staying purely functional. The essence of imperative style is sequencing of actions. Actions are sequenced in a pure setting by using monads. The essence of monads is the binding function:
(Monad m) => (>>=) :: m a -> (a -> m b) -> m b
This function exists in C++, and it's called ;.
A sequence of actions in Haskell, for example, is written thusly:
putStrLn "What's your name?" >>=
const (getLine >>= \name -> putStrLn ("Hello, " ++ name))
Some syntax sugar is available to make this look more imperative (but note that this is the exact same code):
do {
putStrLn "What's your name?";
name <- getLine;
putStrLn ("Hello, " ++ name);
}

Nearly all functional programming languages have some construct to allow for imperative coding (like do in Haskell). There are many problem domains that can't be solved with "pure" functional programming. One of those is network protocols, for example where you need a series of commands in the right order. And such things don't lend themselves well to pure functional programming.
I have to agree with Lothar, though, that list sorting and string matching are not really examples you need to solve imperatively. There are well-known algorithms for such things and they can be implemented efficiently in functional languages already.

I think that 'algorithms' (e.g. method bodies and basic data structures) are where functional programming is best. Assuming nothing completely IO/state-dependent, functional programming excels are authoring algorithms and data structures, often resulting in shorter/simpler/cleaner code than you'd get with an imperative solution. (Don't emulate imperative style, FP style is better for most of these kinds of tasks.)
You want imperative stuff sometimes to deal with IO or low-level performance, and you want OOP for partitioning the high-level design and architecture of a large program, but "in the small" where you write most of your code, FP is a win.
See also
How does functional programming affect the structure of your code?

It works pretty well the other way round emulating functional with imperative style.
Remember that the internal of an interpreter or VM ware so close to metal and performance critical that you should even consider going to assember level and count the the clock cycles for each instruction (like Smalltalk Dophin is just doing it and the results are impressive).
CPU's are imperative.
But there is no problem to do all the basic algorithm implementation - the one you mention are NOT low level - they are basics.

I don't know about list sorting, but you'd be hard pressed to bootstrapp a language without some kind of string matching in the compiler or runtime. So you need that routine to create the language. As there isn't a great deal of point writing the same code twice, when you create the library for matching strings within the language, you call the code written earlier. The degree to which this happens in successive releases will depend on how self hosting the language is, but unless that's a strong design goal there won't be any reason to change it.