Advantages of stateless programming? - functional-programming

I've recently been learning about functional programming (specifically Haskell, but I've gone through tutorials on Lisp and Erlang as well). While I found the concepts very enlightening, I still don't see the practical side of the "no side effects" concept. What are the practical advantages of it? I'm trying to think in the functional mindset, but there are some situations that just seem overly complex without the ability to save state in an easy way (I don't consider Haskell's monads 'easy').
Is it worth continuing to learn Haskell (or another purely functional language) in-depth? Is functional or stateless programming actually more productive than procedural? Is it likely that I will continue to use Haskell or another functional language later, or should I learn it only for the understanding?
I care less about performance than productivity. So I'm mainly asking if I will be more productive in a functional language than a procedural/object-oriented/whatever.

Read Functional Programming in a Nutshell.
There are lots of advantages to stateless programming, not least of which is dramatically multithreaded and concurrent code. To put it bluntly, mutable state is enemy of multithreaded code. If values are immutable by default, programmers don't need to worry about one thread mutating the value of shared state between two threads, so it eliminates a whole class of multithreading bugs related to race conditions. Since there are no race conditions, there's no reason to use locks either, so immutability eliminates another whole class of bugs related to deadlocks as well.
That's the big reason why functional programming matters, and probably the best one for jumping on the functional programming train. There are also lots of other benefits, including simplified debugging (i.e. functions are pure and do not mutate state in other parts of an application), more terse and expressive code, less boilerplate code compared to languages which are heavily dependent on design patterns, and the compiler can more aggressively optimize your code.

The more pieces of your program are stateless, the more ways there are to put pieces together without having anything break. The power of the stateless paradigm lies not in statelessness (or purity) per se, but the ability it gives you to write powerful, reusable functions and combine them.
You can find a good tutorial with lots of examples in John Hughes's paper Why Functional Programming Matters (PDF).
You will be gobs more productive, especially if you pick a functional language that also has algebraic data types and pattern matching (Caml, SML, Haskell).

Many of the other answers have focused on the performance (parallelism) side of functional programming, which I believe is very important. However, you did specifically ask about productivity, as in, can you program the same thing faster in a functional paradigm than in an imperative paradigm.
I actually find (from personal experience) that programming in F# matches the way I think better, and so it's easier. I think that's the biggest difference. I've programmed in both F# and C#, and there's a lot less "fighting the language" in F#, which I love. You don't have to think about the details in F#. Here's a few examples of what I've found I really enjoy.
For example, even though F# is statically typed (all types are resolved at compile time), the type inference figures out what types you have, so you don't have to say it. And if it can't figure it out, it automatically makes your function/class/whatever generic. So you never have to write any generic whatever, it's all automatic. I find that means I'm spending more time thinking about the problem and less how to implement it. In fact, whenever I come back to C#, I find I really miss this type inference, you never realise how distracting it is until you don't need to do it anymore.
Also in F#, instead of writing loops, you call functions. It's a subtle change, but significant, because you don't have to think about the loop construct anymore. For example, here's a piece of code which would go through and match something (I can't remember what, it's from a project Euler puzzle):
let matchingFactors =
factors
|> Seq.filter (fun x -> largestPalindrome % x = 0)
|> Seq.map (fun x -> (x, largestPalindrome / x))
I realise that doing a filter then a map (that's a conversion of each element) in C# would be quite simple, but you have to think at a lower level. Particularly, you'd have to write the loop itself, and have your own explicit if statement, and those kinds of things. Since learning F#, I've realised I've found it easier to code in the functional way, where if you want to filter, you write "filter", and if you want to map, you write "map", instead of implementing each of the details.
I also love the |> operator, which I think separates F# from ocaml, and possibly other functional languages. It's the pipe operator, it lets you "pipe" the output of one expression into the input of another expression. It makes the code follow how I think more. Like in the code snippet above, that's saying, "take the factors sequence, filter it, then map it." It's a very high level of thinking, which you don't get in an imperative programming language because you're so busy writing the loop and if statements. It's the one thing I miss the most whenever I go into another language.
So just in general, even though I can program in both C# and F#, I find it easier to use F# because you can think at a higher level. I would argue that because the smaller details are removed from functional programming (in F# at least), that I am more productive.
Edit: I saw in one of the comments that you asked for an example of "state" in a functional programming language. F# can be written imperatively, so here's a direct example of how you can have mutable state in F#:
let mutable x = 5
for i in 1..10 do
x <- x + i

Consider all the difficult bugs you've spent a long time debugging.
Now, how many of those bugs were due to "unintended interactions" between two separate components of a program? (Nearly all threading bugs have this form: races involving writing shared data, deadlocks, ... Additionally, it is common to find libraries that have some unexpected effect on global state, or read/write the registry/environment, etc.) I would posit that at least 1 in 3 'hard bugs' fall into this category.
Now if you switch to stateless/immutable/pure programming, all those bugs go away. You are presented with some new challenges instead (e.g. when you do want different modules to interact with the environment), but in a language like Haskell, those interactions get explicitly reified into the type system, which means you can just look at the type of a function and reason about the type of interactions it can have with the rest of the program.
That's the big win from 'immutability' IMO. In an ideal world, we'd all design terrific APIs and even when things were mutable, effects would be local and well-documented and 'unexpected' interactions would be kept to a minimum. In the real world, there are lots of APIs that interact with global state in myriad ways, and these are the source of the most pernicious bugs. Aspiring to statelessness is aspiring to be rid of unintended/implicit/behind-the-scenes interactions among components.

One advantage of stateless functions is that they permit precalculation or caching of the function's return values. Even some C compilers allow you to explicitly mark functions as stateless to improve their optimisability. As many others have noted, stateless functions are much easier to parallelise.
But efficiency is not the only concern. A pure function is easier to test and debug since anything that affects it is explicitly stated. And when programming in a functional language, one gets in the habit of making as few functions "dirty" (with I/O, etc.) as possible. Separating out the stateful stuff this way is a good way to design programs, even in not-so-functional languages.
Functional languages can take a while to "get", and it's difficult to explain to someone who hasn't gone through that process. But most people who persist long enough finally realise that the fuss is worth it, even if they don't end up using functional languages much.

Without state, it is very easy to automatically parallelize your code (as CPUs are made with more and more cores this is very important).

Stateless web applications are essential when you start having higher traffic.
There could be plenty of user data that you don't want to store on the client side for security reasons for example. In this case you need to store it server-side. You could use the web applications default session but if you have more than one instance of the application you will need to make sure that each user is always directed to the same instance.
Load balancers often have the ability to have 'sticky sessions' where the load balancer some how knows which server to send the users request to. This is not ideal though, for example it means every time you restart your web application, all connected users will lose their session.
A better approach is to store the session behind the web servers in some sort of data store, these days there are loads of great nosql products available for this (redis, mongo, elasticsearch, memcached). This way the web servers are stateless but you still have state server-side and the availability of this state can be managed by choosing the right datastore setup. These data stores usually have great redundancy so it should almost always be possible to make changes to your web application and even the data store without impacting the users.

My understanding is that FP also has a huge impact on testing. Not having a mutable state will often force you to supply more data to a function than you would have to for a class. There's tradeoffs, but think about how easy it would be to test a function that is "incrementNumberByN" rather than a "Counter" class.
Object
describe("counter", () => {
it("should increment the count by one when 'increment' invoked without
argument", () => {
const counter = new Counter(0)
counter.increment()
expect(counter.count).toBe(1)
})
it("should increment the count by n when 'increment' invoked with
argument", () => {
const counter = new Counter(0)
counter.increment(2)
expect(counter.count).toBe(2)
})
})
functional
describe("incrementNumberBy(startingNumber, increment)", () => {
it("should increment by 1 if n not supplied"){
expect(incrementNumberBy(0)).toBe(1)
}
it("should increment by 1 if n = 1 supplied"){
expect(countBy(0, 1)).toBe(1)
}
})
Since the function has no state and the data going in is more explicit, there are fewer things to focus on when you are trying to figure out why a test might be failing. On the tests for the counter we had to do
const counter = new Counter(0)
counter.increment()
expect(counter.count).toBe(1)
Both of the first two lines contribute to the value of counter.count. In a simple example like this 1 vs 2 lines of potentially problematic code isn't a big deal, but when you deal with a more complex object you might be adding a ton of complexity to your testing as well.
In contrast, when you write a project in a functional language, it nudges you towards keeping fancy algorithms dependent on the data flowing in and out of a particular function, rather than being dependent on the state of your system.
Another way of looking at it would be illustrating the mindset for testing a system in each paradigm.
For Functional Programming: Make sure function A works for given inputs, you make sure function B works with given inputs, make sure C works with given inputs.
For OOP: Make sure Object A's method works given an input argument of X after doing Y and Z to the state of the object. Make sure Object B's method works given an input argument of X after doing W and Y to the state of the object.

The advantages of stateless programming coincide with those goto-free programming, only more so.
Though many descriptions of functional programming emphasize the lack of mutation, the lack of mutation also goes hand in hand with the lack of unconditional control transfers, such as loops. In functional programming languages, recursion, in particularly tail recursion, replaces looping. Recursion eliminates both the unconditional control construct and the mutation of variables in the same stroke. The recursive call binds argument values to parameters, rather than assigning values.
To understand why this is advantageous, rather than turning to functional programming literature, we can consult the 1968 paper by Dijkstra, "Go To Statement Considered Harmful":
"The unbridled use of the go to statement has an immediate consequence that it becomes terribly hard to find a meaningful set of coordinates in which to describe the process progress."
Dijkstra's observations, however still apply to structured programs which avoid go to, because statements like while, if and whatnot are just window dressing on go to! Without using go to, we can still find it impossible to find the coordinates in which to describe the process progress. Dijkstra neglected to observe that bridled go to still has all the same issues.
What this means is that at any given point in the execution of the program, it is not clear how we got there. When we run into a bug, we have to use backwards reasoning: how did we end up in this state? How did we branch into this point of the code? Often it is hard to follow: the trail goes back a few steps and then runs cold due to a vastness of possibilities.
Functional programming gives us the absolute coordinates. We can rely on analytical tools like mathematical induction to understand how the program arrived into a certain situation.
For example, to convince ourselves that a recursive function is correct, we can just verify its base cases, and then understand and check its inductive hypothesis.
If the logic is written as a loop with mutating variables, we need a more complicated set of tools: breaking down the logic into steps with pre- and post-conditions, which we rewrite in terms mathematics that refers to the prior and current values of variables and such. Yes, if the program uses only certain control structures, avoiding go to, then the analysis is somewhat easier. The tools are tailored to the structures: we have a recipe for how we analyze the correctness of an if, while, and other structures.
However, by contrast, in a functional program there is no prior value of any variable to reason about; that whole class of problem has gone away.

Haskel and Prolog are good examples of languages which may be implemented as stateless programming languages. But unfortunately they are not so far. Both Prolog and Haskel have imperative implementations currently. See some SMT's, seem closer to stateless coding.
This is why you are having hard time seeing any benefits from these programing languages. Due to imperative implementations we have no performance and stability benefits. So the lack of stateless languages infrastructure is the main reason you feel no any stateless programming language due to its absence.
These are some benefits of pure stateless:
Task description is the program (compact code)
Stability due to absense of state-dependant bugs (the most of bugs)
Cachable results (a set of inputs always cause same set of outputs)
Distributable computations
Rebaseable to quantum computations
Thin code for multiple overlapping clauses
Allows differentiable programming optimizations
Consistently applying code changes (adding logic breaks nothing written)
Optimized combinatorics (no need to bruteforce enumerations)
Stateless coding is about concentrating on relations between data which then used for computing by deducing it. Basically this is the next level of programming abstraction. It is much closer to native language then any imperative programming languages because it allow describing relations instead of state change sequences.

Related

Global State in Functional Programming (F#)

I want to compute some functions which are dependent on some variables (specific data on which I run the code) and global variables, which are unlikely to be changed, but I want to leave them user-tunable. Just to clarify with an example, suppose I want to declare the following function:
let multiplyByGain x =
x * gain
Where would you declare gain, being gain a global constant for the whole project. In a separate module with constants? That would couple the module with this code, though. Or would you use a curried version:
let multiblyByGain x gain =
x * gain
and then specialize for the specific values? But suppose you have many functions like that, you will have to inject gain to all of them (in a sort of linking module)?
In my specific problem this becomes more cumbersome because both x and gain are arrays which must have the same length, suppose I have to do a Array.zip, e.g.: what is the best practice in terms of functional design to address a global constant, as gain, in a general way?
P.S.: I have found this old postenter link description here, but addresses only a specific problem.
There is no single correct answer to the question and the best approach will depend on a variety of other constraints and requirements that you have. Also, it depends on whether you are asking specifically about F# or whether you are asking about functional programming more generally. I think there are three main points:
Keeping it simple.
Using a module that exposes gain as a global value, which has some initialization code to read configuration seems like a good default approach in F#. If this is changed only rarely (say, before you run the whole computation), then mutation is not going to cause you any troubles. You just need to be careful to avoid changing the values while some computation is still running. I think most F# programmers code tend to be quite pragmatic about this and this seems like the easiest thing to start with.
Unit testing.
If you want to unit ytest your multiplyByGain function with different gain as an argument, then you'll need some way of passing different values of gain to the function from your unit tests. In this case, having it as an additional parameter and using currying is nice, because you can just call it with other values of gain from your tests.
Functional programming.
Some functional language communities (especially Haskell and, sometimes, Scala) are way more strict about state. The purely functional way of keeping state would be to use monads (either the reader monad or some kind of free monad structure). This makes your code a lot more complicated (both conceptually and in terms of extra syntactic overhead), but it is a purely functional solution that eliminates state. In F#, this kind of approach is even more cumbersome, so it's not very common.

What is the need for immutable/persistent data structures in erlang

Each Erlang process maintains its own private address space. All communication happens via copying without sharing (except big binaries). If each process is processing one message at a time with no concurrent access over its objects, I don't see why do we need immutable/persistent data structures.
Erlang was initially implemented in Prolog, which doesn't really use mutable data structures either (though some dialects do). So it started off without them. This makes runtime implementation simpler and faster (garbage collection in particular).
So adding mutable data structures would require a lot of effort, could introduce bugs, and Erlang programmers are nearly by definition at least willing to live without them.
Many actually consider their absence to be a positive good: less concern about object identity, no need for defensive copying because you don't know whether some other piece of code is going to modify the data you passed (or might be changed later to modify it), etc.
This absence does mean that Erlang is pretty unusable in some domains (e.g. high performance scientific computing), at least as the main language. But again, this means that nobody in these domains is going to use Erlang in the first place and so there's no particular incentive to make it usable at the cost of making existing users unhappy.
I remember seeing a mailing list post by Joe Armstrong quite a long time ago (which I couldn't find with a quick search now) saying that he initially planned to add mutable variables when he'd need them... except he never quite did, and performance was good enough for everything he was using Erlang for.
It is indeed the case that in Erlang immutability does not solve any "shared state" problems, as immutable data are "process local".
From the functional programming language perspective, however, immutability offers a number of benefits, summarized adequately in this Quora answer:
The simplest definition of functional programming is that it’s a programming
paradigm where you are transforming immutable data with functions.
The definition uses functions in the mathematical sense, where it’s
something that takes an input, and produces an output.
OO + mutability tends to violate that definition because when you want
to change a piece of data it generally will not return the output, it
will likely return void or unit, and that when you call a method on
the object the object itself isn’t input for the function.
As far as what advantages the paradigm has, composability, thread
safety, being able to track what went wrong where better, the ability
to sort of separate the data from the actual computation on it being
done, etc.
how would this work?
factorial(1) -> 1;
factorial(X) ->
X*factorial(X-1).
if you run factorial(4), a single process will be running the same function. Each time the function will have it's own value of X, if the value of X was in the scope of the process and not the function recursive functions wouldn't work. So first we need to understand scope. If you want to say that you don't see why data needs to be immutable within the scope of a single function/block you would have a point, but it would be a headache to think about where data is immutable and where it isn't.

How does functional programming avoid state when it seems unavoidable?

Let's say we define a function c sum(a, b), functional programming -style, that returns the sum of its arguments. So far so good; all the nice things of FP without any problems.
Now let's say we run this in an environment with dynamic typing and a singleton, stateful error stream. Then let's say we pass a value of a and/or b that sum isn't designed to handle (i.e. not numbers), and it needs to indicate an error somehow.
But how? This function is supposed to be pure and side-effect-less. How does it insert an error into the global error stream without violating that?
No programming language that I know of has anything like a "singleton stateful error stream" built in, so you'd have to make one. And you simply wouldn't make such a thing if you were trying to write your program in a pure functional style.
You could, however, have a sum function that returns either the sum or an indication of an error. The type used to do this is in fact often known by the name Either. Then you could easily make a function that invokes a whole bunch of computations that could possibly return an error, and returns a list of all the errors that were encountered in the other computations. That's pretty close to what you were talking about; it's just explicitly returned rather than being global.
Remember, the question when you're writing a functional program is "how do I make a program that has the behavior I want?" not, "how would I duplicate one particular approach taken in another programming style?". A "global stateful error stream" is a means not an end. You can't have a global stateful error stream in pure function style, no. But ask yourself what you're using the global stateful error stream to achieve; whatever it is, you can achieve that in functional programming, just not with the same mechanism.
Asking whether pure functional programming can implement a particular technique that depends on side effects is like asking how you use techniques from assembly in object-oriented programming. OO provides different tools for you to use to solve problems; limiting yourself to using those tools to emulate a different toolset is not going to be an effective way to work with them.
In response to comments: If what you want to achieve with your error stream is logging error messages to a terminal, then yes, at some level the code is going to have to do IO to do that.1
Printing to terminal is just like any other IO, there's nothing particularly special about it that makes it worthy of singling out as a case where state seems especially unavoidable. So if this turns your question into "How do pure functional programs handle IO?", then there are no doubt many duplicate questions on SO, not to mention many many blog posts and tutorials speaking precisely to that issue. It's not like it's a sudden surprise to implementors and users of pure programming languages, the question has been around for decades, and there have been some quite sophisticated thought put into the answers.
There are different approaches taken in different languages (IO monad in Haskell, unique modes in Mercury, lazy streams of requests and responses in historical versions of Haskell, and more). The basic idea is to come up with a model which can be manipulated by pure code, and hook up manipulations of the model to actual impure operations within the language implementation. This allows you to keep the benefits of purity (the proofs that apply to pure code but not to general impure code will still apply to code using the pure IO model).
The pure model has to be carefully designed so that you can't actually do anything with it that doesn't make sense in terms of actual IO. For example, Mercury does IO by having you write programs as if you're passing around the current state of the universe as an extra parameter. This pure model accurately represents the behaviour of operations that depend on and affect the universe outside the program, but only when there is exactly one state of the universe in the system at any one time, which is threaded through the entire program from start to finish. So some restrictions are put in
The type io is made abstract so that there's no way to construct a value of that type; the only way you can get one is to be passed one from your caller. An io value is passed into the main predicate by the language implementation to kick the whole thing off.
The mode of the io value passed in to main is declared such that it is unique. This means you can't do things that might cause it to be duplicated, such as putting it in a container or passing the same io value to multiple different invocations. The unique mode ensures that you can only ass the io value to a predicate that also uses the unique mode, and as soon as you pass it once the value is "dead" and can't be passed anywhere else.
1 Note that even in imperative programs, you gain a lot of flexibility if you have your error logging system return a stream of error messages and then only actually make the decision to print them close to the outermost layer of the program. If your log calls are directly writing the output immediately, here's just a few things I can think of off the top of my head that become much harder to do with such a system:
Speculatively execute a computation and see whether it failed by checking whether it emitted any errors
Combine multiple high level systems into a single system, adding tags to the logs to distinguish each system
Emit debug and info log messages only if there is also an error message (so the output is clean when there are no errors to debug, and rich in detail when there are)

2 questions at the end of a functional programming course

Here seems to be the two biggest things I can take from the How to Design Programs (simplified Racket) course I just finished, straight from the lecture notes of the course:
1) Tail call optimization, and the lack thereof in non-functional languages:
Sadly, most other languages do not support TAIL CALL
OPTIMIZATION. Put another way, they do build up a stack
even for tail calls.
Tail call optimization was invented in the mid 70s, long
after the main elements of most languages were developed.
Because they do not have tail call optimization, these
languages provide a fixed set of LOOPING CONSTRUCTS that
make it possible to traverse arbitrary sized data.
a) What are the equivalents to this type of optimization in procedural languages that don't feature it?
b) Do using those equivalents mean we avoid building up a stack in similar situations in languages that don't have it?
2) Mutation and multicore processors
This mechanism is fundamental in almost any other language you
program in. We have delayed introducing it until now for
several reasons:
despite being fundamental, it is surprisingly complex
overuse of it leads to programs that are not amenable
to parallelization (running on multiple processors).
Since multi-core computers are now common, the ability
to use mutation only when needed is becoming more and
more important
overuse of mutation can also make it difficult to
understand programs, and difficult to test them well
But mutable variables are important, and learning this mechanism
will give you more preparation to work with Java, Python and many
other languages. Even in such languages, you want to use a style
called "mostly functional programming".
I learned some Java, Python and C++ before taking this course, so came to take mutation for granted. Now that has been all thrown in the air by the above statement. My questions are:
a) where could I find more detailed information regarding what is suggested in the 2nd bullet, and what to do about it, and
b) what kind of patterns would emerge from a "mostly functional programming" style, as opposed to a more careless style I probably would have had had I continued on with those other languages instead of taking this course?
As Leppie points out, looping constructs manage to recover the space savings of proper tail calling, for the particular kinds of loops that they support. The only problem with looping constructs is that the ones you have are never enough, unless you just hurl the ball into the user's court and force them to model the stack explicitly.
To take an example, suppose you're traversing a binary tree using a loop. It works... but you need to explicitly keep track of the "ones to come back to." A recursive traversal in a tail-calling language allows you to have your cake and eat it too, by not wasting space when not required, and not forcing you to keep track of the stack yourself.
Your question on parallelism and concurrency is much more wide-open, and the best pointers are probably to areas of research, rather than existing solutions. I think that most would agree that there's a crisis going on in the computing world; how do we adapt our mutation-heavy programming skills to the new multi-core world?
Simply switching to a functional paradigm isn't a silver bullet here, either; we still don't know how to write high-level code and generate blazing fast non-mutating run-concurrently code. Lots of folks are working on this, though!
To expand on the "mutability makes parallelism hard" concept, when you have multiple cores going, you have to use synchronisation if you want to modify something from one core and have it be seen consistently by all the other cores.
Getting synchronisation right is hard. If you over-synchronise, you have deadlocks, slow (serial rather than parallel) performance, etc. If you under-synchronise, you have partially-observed changes (where another core sees only a portion of the changes you made from a different core), leaving your objects observed in an invalid "halfway changed" state.
It is for that reason that many functional programming languages encourage a message-queue concept instead of a shared state concept. In that case, the only shared state is the message queue, and managing synchronisation in a message queue is a solved problem.
a) What are the equivalents to this type of optimization in procedural languages that don't feature it? b) Do using those equivalents mean we avoid building up a stack in similar situations in languages that don't have it?
Well, the significance of a tail call is that it can evaluate another function without adding to the call stack, so anything that builds up the stack can't really be called an equivalent.
A tail call behaves essentially like a jump to the new code, using the language trappings of a function call and all the appropriate detail management. So in languages without this optimization, you'd use a jump within a single function. Loops, conditional blocks, or even arbitrary goto statements if nothing else works.
a) where could I find more detailed information regarding what is suggested in the 2nd bullet, and what to do about it
The second bullet sounds like an oversimplification. There are many ways to make parallelization more difficult than it needs to be, and overuse of mutation is just one.
However, note that parallelization (splitting a task into pieces that can be done simultaneously) is not entirely the same thing as concurrency (having multiple tasks executed simultaneously that may interact), though there's certainly overlap. Avoiding mutation is incredibly helpful in writing concurrent programs, since immutable data avoids a lot of race conditions and resource contention that would otherwise be possible.
b) what kind of patterns would emerge from a "mostly functional programming" style, as opposed to a more careless style I probably would have had had I continued on with those other languages instead of taking this course?
Have you looked at Haskell or Clojure? Both are heavily inclined to a very functional style emphasizing controlled mutation. Haskell is more rigorous about it but has a lot of tools for working with limited forms of mutability, while Clojure is a bit more informal and might be more familiar to you since it's another Lisp dialect.

Do purely functional languages really guarantee immutability?

In a purely functional language, couldn't one still define an "assignment" operator, say, "<-", such that the command, say, "i <- 3", instead of directly assigning the immutable variable i, would create a copy of the entire current call stack, except replacing i with 3 in the new call stack, and executing the new call stack from that point onward? Given that no data actually changed, wouldn't that still be considered "purely functional" by definition? Of course the compiler would simply make the optimization to simply assign 3 to i, in which case what's the difference between imperative and purely functional?
Purely functional languages, such as Haskell, have ways of modelling imperative languages, and they are not shy about admitting it either. :)
See http://www.haskell.org/tutorial/io.html, in particular 7.5:
So, in the end, has Haskell simply
re-invented the imperative wheel?
In some sense, yes. The I/O monad
constitutes a small imperative
sub-language inside Haskell, and thus
the I/O component of a program may
appear similar to ordinary imperative
code. But there is one important
difference: There is no special
semantics that the user needs to deal
with. In particular, equational
reasoning in Haskell is not
compromised. The imperative feel of
the monadic code in a program does not
detract from the functional aspect of
Haskell. An experienced functional
programmer should be able to minimize
the imperative component of the
program, only using the I/O monad for
a minimal amount of top-level
sequencing. The monad cleanly
separates the functional and
imperative program components. In
contrast, imperative languages with
functional subsets do not generally
have any well-defined barrier between
the purely functional and imperative
worlds.
So the value of functional languages is not that they make state mutation impossible, but that they provide a way to allow you to keep the purely functional parts of your program separate from the state-mutating parts.
Of course, you can ignore this and write your entire program in the imperative style, but then you won't be taking advantage of the facilities of the language, so why use it?
Update
Your idea is not as flawed as you assume. Firstly, if someone familiar only with imperative languages wanted to loop through a range of integers, they might wonder how this could be achieved without a way to increment a counter.
But of course instead you just write a function that acts as the body of the loop, and then make it call itself. Each invocation of the function corresponds to an "iteration step". And in the scope of each invocation the parameter has a different value, acting like an incrementing variable. Finally, the runtime can note that the recursive call appears at the end of the invocation, and so it can reuse the top of the function-call stack instead of growing it (tail call). Even this simple pattern has almost all of the flavour of your idea - including the compiler/runtime quietly stepping in and actually making mutation occur (overwriting the top of the stack). Not only is it logically equivalent to a loop with a mutating counter, but in fact it makes the CPU and memory do the same thing physically.
You mention a GetStack that would return the current stack as a data structure. That would indeed be a violation of functional purity, given that it would necessarily return something different each time it was called (with no arguments). But how about a function CallWithStack, to which you pass a function of your own, and it calls back to your function and passes it the current stack as a parameter? That would be perfectly okay. CallCC works a bit like that.
Haskell doesn't readily give you ways to introspect or "execute" call stacks, so I wouldn't worry too much about that particular bizarre scheme. However in general it is true that one can subvert the type system using unsafe "functions" such as unsafePerformIO :: IO a -> a. The idea is to make it difficult, not impossible, to violate purity.
Indeed, in many situations, such as when making Haskell bindings for a C library, these mechanisms are quite necessary... by using them you are removing the burden of proof of purity from the compiler and taking it upon yourself.
There is a proposal to actually guarantee safety by outlawing such subversions of the type system; I'm not too familiar with it, but you can read about it here.
Immutability is a property of the language, not of the implementation.
An operation a <- expr that copies data is still an imperative operation, if values that refer to the location a appear to have changed from the programmers point of view.
Likewise, a purely functional language implementation may overwrite and reuse variables to its heart's content, as long as each modification is invisible to the programmer. For example, the map function can in principle overwrite a list instead of creating a new, whenever the language implementation can deduce that the old list won't be needed anywhere.

Resources