What is (functional) reactive programming?

What is (functional) reactive programming? - functional-programming

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I've read the Wikipedia article on reactive programming. I've also read the small article on functional reactive programming. The descriptions are quite abstract.
What does functional reactive programming (FRP) mean in practice?
What does reactive programming (as opposed to non-reactive programming?) consist of?
My background is in imperative/OO languages, so an explanation that relates to this paradigm would be appreciated.

If you want to get a feel for FRP, you could start with the old Fran tutorial from 1998, which has animated illustrations. For papers, start with Functional Reactive Animation and then follow up on links on the publications link on my home page and the FRP link on the Haskell wiki.
Personally, I like to think about what FRP means before addressing how it might be implemented.
(Code without a specification is an answer without a question and thus "not even wrong".)
So I don't describe FRP in representation/implementation terms as Thomas K does in another answer (graphs, nodes, edges, firing, execution, etc).
There are many possible implementation styles, but no implementation says what FRP is.
I do resonate with Laurence G's simple description that FRP is about "datatypes that represent a value 'over time' ".
Conventional imperative programming captures these dynamic values only indirectly, through state and mutations.
The complete history (past, present, future) has no first class representation.
Moreover, only discretely evolving values can be (indirectly) captured, since the imperative paradigm is temporally discrete.
In contrast, FRP captures these evolving values directly and has no difficulty with continuously evolving values.
FRP is also unusual in that it is concurrent without running afoul of the theoretical & pragmatic rats' nest that plagues imperative concurrency.
Semantically, FRP's concurrency is fine-grained, determinate, and continuous.
(I'm talking about meaning, not implementation. An implementation may or may not involve concurrency or parallelism.)
Semantic determinacy is very important for reasoning, both rigorous and informal.
While concurrency adds enormous complexity to imperative programming (due to nondeterministic interleaving), it is effortless in FRP.
So, what is FRP?
You could have invented it yourself.
Start with these ideas:
Dynamic/evolving values (i.e., values "over time") are first class values in themselves. You can define them and combine them, pass them into & out of functions. I called these things "behaviors".
Behaviors are built up out of a few primitives, like constant (static) behaviors and time (like a clock), and then with sequential and parallel combination. n behaviors are combined by applying an n-ary function (on static values), "point-wise", i.e., continuously over time.
To account for discrete phenomena, have another type (family) of "events", each of which has a stream (finite or infinite) of occurrences. Each occurrence has an associated time and value.
To come up with the compositional vocabulary out of which all behaviors and events can be built, play with some examples. Keep deconstructing into pieces that are more general/simple.
So that you know you're on solid ground, give the whole model a compositional foundation, using the technique of denotational semantics, which just means that (a) each type has a corresponding simple & precise mathematical type of "meanings", and (b) each primitive and operator has a simple & precise meaning as a function of the meanings of the constituents.
Never, ever mix implementation considerations into your exploration process. If this description is gibberish to you, consult (a) Denotational design with type class morphisms, (b) Push-pull functional reactive programming (ignoring the implementation bits), and (c) the Denotational Semantics Haskell wikibooks page. Beware that denotational semantics has two parts, from its two founders Christopher Strachey and Dana Scott: the easier & more useful Strachey part and the harder and less useful (for software design) Scott part.
If you stick with these principles, I expect you'll get something more-or-less in the spirit of FRP.
Where did I get these principles? In software design, I always ask the same question: "what does it mean?".
Denotational semantics gave me a precise framework for this question, and one that fits my aesthetics (unlike operational or axiomatic semantics, both of which leave me unsatisfied).
So I asked myself what is behavior?
I soon realized that the temporally discrete nature of imperative computation is an accommodation to a particular style of machine, rather than a natural description of behavior itself.
The simplest precise description of behavior I can think of is simply "function of (continuous) time", so that's my model.
Delightfully, this model handles continuous, deterministic concurrency with ease and grace.
It's been quite a challenge to implement this model correctly and efficiently, but that's another story.

In pure functional programming, there are no side-effects. For many types of software (for example, anything with user interaction) side-effects are necessary at some level.
One way to get side-effect like behavior while still retaining a functional style is to use functional reactive programming. This is the combination of functional programming, and reactive programming. (The Wikipedia article you linked to is about the latter.)
The basic idea behind reactive programming is that there are certain datatypes that represent a value "over time". Computations that involve these changing-over-time values will themselves have values that change over time.
For example, you could represent the mouse coordinates as a pair of integer-over-time values. Let's say we had something like (this is pseudo-code):
x = <mouse-x>;
y = <mouse-y>;
At any moment in time, x and y would have the coordinates of the mouse. Unlike non-reactive programming, we only need to make this assignment once, and the x and y variables will stay "up to date" automatically. This is why reactive programming and functional programming work so well together: reactive programming removes the need to mutate variables while still letting you do a lot of what you could accomplish with variable mutations.
If we then do some computations based on this the resulting values will also be values that change over time. For example:
minX = x - 16;
minY = y - 16;
maxX = x + 16;
maxY = y + 16;
In this example, minX will always be 16 less than the x coordinate of the mouse pointer. With reactive-aware libraries you could then say something like:
rectangle(minX, minY, maxX, maxY)
And a 32x32 box will be drawn around the mouse pointer and will track it wherever it moves.
Here is a pretty good paper on functional reactive programming.

An easy way of reaching a first intuition about what it's like is to imagine your program is a spreadsheet and all of your variables are cells. If any of the cells in a spreadsheet change, any cells that refer to that cell change as well. It's just the same with FRP. Now imagine that some of the cells change on their own (or rather, are taken from the outside world): in a GUI situation, the position of the mouse would be a good example.
That necessarily misses out rather a lot. The metaphor breaks down pretty fast when you actually use a FRP system. For one, there are usually attempts to model discrete events as well (e.g. the mouse being clicked). I'm only putting this here to give you an idea what it's like.

To me it is about 2 different meanings of symbol =:
In math x = sin(t) means, that x is different name for sin(t). So writing x + y is the same thing as sin(t) + y. Functional reactive programming is like math in this respect: if you write x + y, it is computed with whatever the value of t is at the time it's used.
In C-like programming languages (imperative languages), x = sin(t) is an assignment: it means that x stores the value of sin(t) taken at the time of the assignment.

OK, from background knowledge and from reading the Wikipedia page to which you pointed, it appears that reactive programming is something like dataflow computing but with specific external "stimuli" triggering a set of nodes to fire and perform their computations.
This is pretty well suited to UI design, for example, in which touching a user interface control (say, the volume control on a music playing application) might need to update various display items and the actual volume of audio output. When you modify the volume (a slider, let's say) that would correspond to modifying the value associated with a node in a directed graph.
Various nodes having edges from that "volume value" node would automatically be triggered and any necessary computations and updates would naturally ripple through the application. The application "reacts" to the user stimulus. Functional reactive programming would just be the implementation of this idea in a functional language, or generally within a functional programming paradigm.
For more on "dataflow computing", search for those two words on Wikipedia or using your favorite search engine. The general idea is this: the program is a directed graph of nodes, each performing some simple computation. These nodes are connected to each other by graph links that provide the outputs of some nodes to the inputs of others.
When a node fires or performs its calculation, the nodes connected to its outputs have their corresponding inputs "triggered" or "marked". Any node having all inputs triggered/marked/available automatically fires. The graph might be implicit or explicit depending on exactly how reactive programming is implemented.
Nodes can be looked at as firing in parallel, but often they are executed serially or with limited parallelism (for example, there may be a few threads executing them). A famous example was the Manchester Dataflow Machine, which (IIRC) used a tagged data architecture to schedule execution of nodes in the graph through one or more execution units. Dataflow computing is fairly well suited to situations in which triggering computations asynchronously giving rise to cascades of computations works better than trying to have execution be governed by a clock (or clocks).
Reactive programming imports this "cascade of execution" idea and seems to think of the program in a dataflow-like fashion but with the proviso that some of the nodes are hooked to the "outside world" and the cascades of execution are triggered when these sensory-like nodes change. Program execution would then look like something analogous to a complex reflex arc. The program may or may not be basically sessile between stimuli or may settle into a basically sessile state between stimuli.
"non-reactive" programming would be programming with a very different view of the flow of execution and relationship to external inputs. It's likely to be somewhat subjective, since people will likely be tempted to say anything that responds to external inputs "reacts" to them. But looking at the spirit of the thing, a program that polls an event queue at a fixed interval and dispatches any events found to functions (or threads) is less reactive (because it only attends to user input at a fixed interval). Again, it's the spirit of the thing here: one can imagine putting a polling implementation with a fast polling interval into a system at a very low level and program in a reactive fashion on top of it.

After reading many pages about FRP I finally came across this enlightening writing about FRP, it finally made me understand what FRP really is all about.
I quote below Heinrich Apfelmus (author of reactive banana).
What is the essence of functional reactive programming?
A common answer would be that “FRP is all about describing a system in
terms of time-varying functions instead of mutable state”, and that
would certainly not be wrong. This is the semantic viewpoint. But in
my opinion, the deeper, more satisfying answer is given by the
following purely syntactic criterion:
The essence of functional reactive programming is to specify the dynamic behavior of a value completely at the time of declaration.
For instance, take the example of a counter: you have two buttons
labelled “Up” and “Down” which can be used to increment or decrement
the counter. Imperatively, you would first specify an initial value
and then change it whenever a button is pressed; something like this:
counter := 0 -- initial value
on buttonUp = (counter := counter + 1) -- change it later
on buttonDown = (counter := counter - 1)
The point is that at the time of declaration, only the initial value
for the counter is specified; the dynamic behavior of counter is
implicit in the rest of the program text. In contrast, functional
reactive programming specifies the whole dynamic behavior at the time
of declaration, like this:
counter :: Behavior Int
counter = accumulate ($) 0
(fmap (+1) eventUp
`union` fmap (subtract 1) eventDown)
Whenever you want to understand the dynamics of counter, you only have
to look at its definition. Everything that can happen to it will
appear on the right-hand side. This is very much in contrast to the
imperative approach where subsequent declarations can change the
dynamic behavior of previously declared values.
So, in my understanding an FRP program is a set of equations:
j is discrete: 1,2,3,4...
f depends on t so this incorporates the possiblilty to model external stimuli
all state of the program is encapsulated in variables x_i
The FRP library takes care of progressing time, in other words, taking j to j+1.
I explain these equations in much more detail in this video.
EDIT:
About 2 years after the original answer, recently I came to the conclusion that FRP implementations have another important aspect. They need to (and usually do) solve an important practical problem: cache invalidation.
The equations for x_i-s describe a dependency graph. When some of the x_i changes at time j then not all the other x_i' values at j+1 need to be updated, so not all the dependencies need to be recalculated because some x_i' might be independent from x_i.
Furthermore, x_i-s that do change can be incrementally updated. For example let's consider a map operation f=g.map(_+1) in Scala, where f and g are List of Ints. Here f corresponds to x_i(t_j) and g is x_j(t_j). Now if I prepend an element to g then it would be wasteful to carry out the map operation for all the elements in g. Some FRP implementations (for example reflex-frp) aim to solve this problem. This problem is also known as incremental computing.
In other words, behaviours (the x_i-s ) in FRP can be thought as cache-ed computations. It is the task of the FRP engine to efficiently invalidate and recompute these cache-s (the x_i-s) if some of the f_i-s do change.

The paper Simply efficient functional reactivity by Conal Elliott (direct PDF, 233 KB) is a fairly good introduction. The corresponding library also works.
The paper is now superceded by another paper, Push-pull functional reactive programming (direct PDF, 286 KB).

Disclaimer: my answer is in the context of rx.js - a 'reactive programming' library for Javascript.
In functional programming, instead of iterating through each item of a collection, you apply higher order functions (HoFs) to the collection itself. So the idea behind FRP is that instead of processing each individual event, create a stream of events (implemented with an observable*) and apply HoFs to that instead. This way you can visualize the system as data pipelines connecting publishers to subscribers.
The major advantages of using an observable are:
i) it abstracts away state from your code, e.g., if you want the event handler to get fired only for every 'n'th event, or stop firing after the first 'n' events, or start firing only after the first 'n' events, you can just use the HoFs (filter, takeUntil, skip respectively) instead of setting, updating and checking counters.
ii) it improves code locality - if you have 5 different event handlers changing the state of a component, you can merge their observables and define a single event handler on the merged observable instead, effectively combining 5 event handlers into 1. This makes it very easy to reason about what events in your entire system can affect a component, since it's all present in a single handler.
An Observable is the dual of an Iterable.
An Iterable is a lazily consumed sequence - each item is pulled by the iterator whenever it wants to use it, and hence the enumeration is driven by the consumer.
An observable is a lazily produced sequence - each item is pushed to the observer whenever it is added to the sequence, and hence the enumeration is driven by the producer.

Dude, this is a freaking brilliant idea! Why didn't I find out about this back in 1998? Anyway, here's my interpretation of the Fran tutorial. Suggestions are most welcome, I am thinking about starting a game engine based on this.
import pygame
from pygame.surface import Surface
from pygame.sprite import Sprite, Group
from pygame.locals import *
from time import time as epoch_delta
from math import sin, pi
from copy import copy
pygame.init()
screen = pygame.display.set_mode((600,400))
pygame.display.set_caption('Functional Reactive System Demo')
class Time:
def __float__(self):
return epoch_delta()
time = Time()
class Function:
def __init__(self, var, func, phase = 0., scale = 1., offset = 0.):
self.var = var
self.func = func
self.phase = phase
self.scale = scale
self.offset = offset
def copy(self):
return copy(self)
def __float__(self):
return self.func(float(self.var) + float(self.phase)) * float(self.scale) + float(self.offset)
def __int__(self):
return int(float(self))
def __add__(self, n):
result = self.copy()
result.offset += n
return result
def __mul__(self, n):
result = self.copy()
result.scale += n
return result
def __inv__(self):
result = self.copy()
result.scale *= -1.
return result
def __abs__(self):
return Function(self, abs)
def FuncTime(func, phase = 0., scale = 1., offset = 0.):
global time
return Function(time, func, phase, scale, offset)
def SinTime(phase = 0., scale = 1., offset = 0.):
return FuncTime(sin, phase, scale, offset)
sin_time = SinTime()
def CosTime(phase = 0., scale = 1., offset = 0.):
phase += pi / 2.
return SinTime(phase, scale, offset)
cos_time = CosTime()
class Circle:
def __init__(self, x, y, radius):
self.x = x
self.y = y
self.radius = radius
#property
def size(self):
return [self.radius * 2] * 2
circle = Circle(
x = cos_time * 200 + 250,
y = abs(sin_time) * 200 + 50,
radius = 50)
class CircleView(Sprite):
def __init__(self, model, color = (255, 0, 0)):
Sprite.__init__(self)
self.color = color
self.model = model
self.image = Surface([model.radius * 2] * 2).convert_alpha()
self.rect = self.image.get_rect()
pygame.draw.ellipse(self.image, self.color, self.rect)
def update(self):
self.rect[:] = int(self.model.x), int(self.model.y), self.model.radius * 2, self.model.radius * 2
circle_view = CircleView(circle)
sprites = Group(circle_view)
running = True
while running:
for event in pygame.event.get():
if event.type == QUIT:
running = False
if event.type == KEYDOWN and event.key == K_ESCAPE:
running = False
screen.fill((0, 0, 0))
sprites.update()
sprites.draw(screen)
pygame.display.flip()
pygame.quit()
In short: If every component can be treated like a number, the whole system can be treated like a math equation, right?

Paul Hudak's book, The Haskell School of Expression, is not only a fine introduction to Haskell, but it also spends a fair amount of time on FRP. If you're a beginner with FRP, I highly recommend it to give you a sense of how FRP works.
There is also what looks like a new rewrite of this book (released 2011, updated 2014), The Haskell School of Music.

According to the previous answers, it seems that mathematically, we simply think in a higher order. Instead of thinking a value x having type X, we think of a function x: T → X, where T is the type of time, be it the natural numbers, the integers or the continuum. Now when we write y := x + 1 in the programming language, we actually mean the equation y(t) = x(t) + 1.

Acts like a spreadsheet as noted. Usually based on an event driven framework.
As with all "paradigms", it's newness is debatable.
From my experience of distributed flow networks of actors, it can easily fall prey to a general problem of state consistency across the network of nodes i.e. you end up with a lot of oscillation and trapping in strange loops.
This is hard to avoid as some semantics imply referential loops or broadcasting, and can be quite chaotic as the network of actors converges (or not) on some unpredictable state.
Similarly, some states may not be reached, despite having well-defined edges, because the global state steers away from the solution. 2+2 may or may not get to be 4 depending on when the 2's became 2, and whether they stayed that way. Spreadsheets have synchronous clocks and loop detection. Distributed actors generally don't.
All good fun :).

I found this nice video on the Clojure subreddit about FRP. It is pretty easy to understand even if you don't know Clojure.
Here's the video: http://www.youtube.com/watch?v=nket0K1RXU4
Here's the source the video refers to in the 2nd half: https://github.com/Cicayda/yolk-examples/blob/master/src/yolk_examples/client/autocomplete.cljs

This article by Andre Staltz is the best and clearest explanation I've seen so far.
Some quotes from the article:
Reactive programming is programming with asynchronous data streams.
On top of that, you are given an amazing toolbox of functions to combine, create and filter any of those streams.
Here's an example of the fantastic diagrams that are a part of the article:

It is about mathematical data transformations over time (or ignoring time).
In code this means functional purity and declarative programming.
State bugs are a huge problem in the standard imperative paradigm. Various bits of code may change some shared state at different "times" in the programs execution. This is hard to deal with.
In FRP you describe (like in declarative programming) how data transforms from one state to another and what triggers it. This allows you to ignore time because your function is simply reacting to its inputs and using their current values to create a new one. This means that the state is contained in the graph (or tree) of transformation nodes and is functionally pure.
This massively reduces complexity and debugging time.
Think of the difference between A=B+C in math and A=B+C in a program.
In math you are describing a relationship that will never change. In a program, its says that "Right now" A is B+C. But the next command might be B++ in which case A is not equal to B+C. In math or declarative programming A will always be equal to B+C no matter what point in time you ask.
So by removing the complexities of shared state and changing values over time. You program is much easier to reason about.
An EventStream is an EventStream + some transformation function.
A Behaviour is an EventStream + Some value in memory.
When the event fires the value is updated by running the transformation function. The value that this produces is stored in the behaviours memory.
Behaviours can be composed to produce new behaviours that are a transformation on N other behaviours. This composed value will recalculate as the input events (behaviours) fire.
"Since observers are stateless, we often need several of them to simulate a state machine as in the drag example. We have to save the state where it is accessible to all involved observers such as in the variable path above."
Quote from - Deprecating The Observer Pattern
http://infoscience.epfl.ch/record/148043/files/DeprecatingObserversTR2010.pdf

The short and clear explanation about Reactive Programming appears on Cyclejs - Reactive Programming, it uses simple and visual samples.
A [module/Component/object] is reactive means it is fully responsible
for managing its own state by reacting to external events.
What is the benefit of this approach? It is Inversion of Control,
mainly because [module/Component/object] is responsible for itself, improving encapsulation using private methods against public ones.
It is a good startup point, not a complete source of knowlege. From there you could jump to more complex and deep papers.

Check out Rx, Reactive Extensions for .NET. They point out that with IEnumerable you are basically 'pulling' from a stream. Linq queries over IQueryable/IEnumerable are set operations that 'suck' the results out of a set. But with the same operators over IObservable you can write Linq queries that 'react'.
For example, you could write a Linq query like
(from m in MyObservableSetOfMouseMovements
where m.X<100 and m.Y<100
select new Point(m.X,m.Y)).
and with the Rx extensions, that's it: You have UI code that reacts to the incoming stream of mouse movements and draws whenever you're in the 100,100 box...

FRP is a combination of Functional programming(programming paradigm built upon the idea of everything is a function) and reactive programming paradigm (built upon the idea that everything is a stream(observer and observable philosophy)). It is supposed to be the best of the worlds.
Check out Andre Staltz post on reactive programming to start with.

Related

"Mathematical state" with functional languages?

I've read some of the discussions here, as well as followed links to other explanations, but I'm still not able to understand the mathematical connection between "changing state" and "not changing state" as it pertains to our functional programming versus non-FP debate. As I understand, the basic argument goes back to the pure math definition of a function, whereby a function maps a domain member to only one range member. This is then compared to when a computer code function is given certain input, it will always produce the same output, i.e., not vary from use to use, i.e.i.e., the function's state, as in its domain to range mapping behavior, will not change.
Then it get foggy in my mind. Here's an example. Let's say I want to display closed block-like polygons on an x-y field. In GIS software I understand everything is stored as directed, closed graphs, i.e. a square is four vectors, their heads and ends connected. The raw data representation is just the individual Cartesian start and end points of each vector. And of course, there might be a function in the software that "processed" all these coordinate sets. Good. But what about representing each polygon in a mathematical way, e.g., a rectangle in the positive x, negative y quadrant might be:
Z = {(x,y) | 3 <= x <= 5, -2 <= y <= -1}
So we'd have many Z-like functions, each one expressing an individual polygon -- and not being a whiz with my matrix math, maybe these "functions" could then be represented as matrices . . . but I digress.
So with the usual raw vector-data method, I've got one function in my code that "changes state" as it processes each set of coordinates and then draws each polygon (and then deals with polygons changing), while the one-and-only-one-Z-like-function-per-polygon method would seem to hold to the "don't change state" rule exactly. Right? Or am I way off here? It seems like the old-fashioned, one-function-processing-raw-coordinate-data is not mutating the domain-range purity law either. I'm confused....
Part of my inspiration came from reading about a new idea of image processing where instead of slamming racks of pixels, each "frame" would be represented by one big function capable of "gnu-plotting" the whole image, edges, colors, gradients, etc. Is this germane? I guess I'm trying to fathom why I would want to represent, say, a street map of polygons (e.g. city blocks) one way or the other. I keep hearing functional language advocates dance around the idea that a mathematical function is pure and safe and good and ultimately Utopian, while the non-FP software function is some sort of sloppy kludge holding us back from Borg-like bliss.
But even more confusing is memory management vis-a-vis FP versus non-FP. What I keep hearing (e.g. parallel programming) is that FP isn't changing a "memory state" as much as, say, a C/C++ program does. Is this like the Google File System where literally everything is just sitting out there in a virtual memory pool, rather than being data moved in and out of databases and memory locations? Somehow all these things are related. Therefore, it seems like the perfect FP program is just one single function (possibly made up of many sub-functions) doing one single task -- although a quick glance at any elisp code seems to be a study of programming schizophrenia on this count.

Referential transparency in programming (and mathematics, logic, etc.) is the principle that the meaning or value of an expression can be determined without needing any non-local context, and that the value of an expression doesn't change. Code like
int x = 0;
int nextX() {
return x++;
}
violates referential transparency in that nextX() will at one moment return 32, and at the next invocation return 33, and there is no way, based only on local analysis, what nextX() will return in any given location. It is easy in many cases to turn a non-referentially transparent procedure into a referentially transparent function by adding an argument to the procedure. For instance, in the example just given, the addition of a parameter currentX, makes nextX referentially transparent:
int nextX( int currentX ) {
return currentX+1;
}
This does require, of course, that every time nextX is called, the previous value is available.
For procedures whose entire purpose is to modify state (e.g., the state of the screen), this doesn't make as much sense. For instance, while we could write a method print which is referentially transparent in one sense:
int print( int x ) {
printf( "%d", x );
return x;
}
there's still a sort of problem in that the state of the system is modified. Methods that ask about the state of the screen will have different results before and after a call to print, for instance. To make these kinds of procedures referentially transparent, they can be augmented with an argument representing the state of the system. For instance:
// print x to screen, and return the new screen that results
Screen print( int x, Screen screen ) {
...
}
// return the contents of screen
ScreenContents returnContentsOfScreen( Screen screen ) {
...
}
Now we have referential transparency, though at the expense of having to pass Screen objects around. For instance:
Screen screen0 = getInitialScreen();
Screen screen1 = print( 2, screen0 );
Screen screen2 = print( 3, screen1 );
...
This probably feels like overkill for working with IO, since the intent is, after all, to modify some state (namely, the screen, or filesystem, or …). Most programming languages, as a result, don't make IO methods referentially transparent. Some, like Haskell, however, do. Since doing it as just shown is rather cumbersome, these language will typically have some syntax to make things a bit more clean. In Haskell, this is accomplished by Monads and do notation (which is really out of scope for this answer). If you're interested in how the Monad concept is used to achieve this, you might be interested in this article, You Could Have Invented Monads! (And Maybe You Already Have.)

A Functional-Imperative Hybrid

Pure functional programming languages do not allow mutable data, but some computations are more naturally/intuitively expressed in an imperative way -- or an imperative version of an algorithm may be more efficient. I am aware that most functional languages are not pure, and let you assign/reassign variables and do imperative things but generally discourage it.
My question is, why not allow local state to be manipulated in local variables, but require that functions can only access their own locals and global constants (or just constants defined in an outer scope)? That way, all functions maintain referential transparency (they always give the same return value given the same arguments), but within a function, a computation can be expressed in imperative terms (like, say, a while loop).
IO and such could still be accomplished in the normal functional ways - through monads or passing around a "world" or "universe" token.

My question is, why not allow local state to be manipulated in local variables, but require that functions can only access their own locals and global constants (or just constants defined in an outer scope)?
Good question. I think the answer is that mutable locals are of limited practical value but mutable heap-allocated data structures (primarily arrays) are enormously valuable and form the backbone of many important collections including efficient stacks, queues, sets and dictionaries. So restricting mutation to locals only would not give an otherwise purely functional language any of the important benefits of mutation.
On a related note, communicating sequential processes exchanging purely functional data structures offer many of the benefits of both worlds because the sequential processes can use mutation internally, e.g. mutable message queues are ~10x faster than any purely functional queues. For example, this is idiomatic in F# where the code in a MailboxProcessor uses mutable data structures but the messages communicated between them are immutable.
Sorting is a good case study in this context. Sedgewick's quicksort in C is short and simple and hundreds of times faster than the fastest purely functional sort in any language. The reason is that quicksort mutates the array in-place. Mutable locals would not help. Same story for most graph algorithms.

The short answer is: there are systems to allow what you want. For example, you can do it using the ST monad in Haskell (as referenced in the comments).
The ST monad approach is from Haskell's Control.Monad.ST. Code written in the ST monad can use references (STRef) where convenient. The nice part is that you can even use the results of the ST monad in pure code, as it is essentially self-contained (this is basically what you were wanting in the question).
The proof of this self-contained property is done through the type-system. The ST monad carries a state-thread parameter, usually denoted with a type-variable s. When you have such a computation you'll have monadic result, with a type like:
foo :: ST s Int
To actually turn this into a pure result, you have to use
runST :: (forall s . ST s a) -> a
You can read this type like: give me a computation where the s type parameter doesn't matter, and I can give you back the result of the computation, without the ST baggage. This basically keeps the mutable ST variables from escaping, as they would carry the s with them, which would be caught by the type system.
This can be used to good effect on pure structures that are implemented with underlying mutable structures (like the vector package). One can cast off the immutability for a limited time to do something that mutates the underlying array in place. For example, one could combine the immutable Vector with an impure algorithms package to keep the most of the performance characteristics of the in place sorting algorithms and still get purity.
In this case it would look something like:
pureSort :: Ord a => Vector a -> Vector a
pureSort vector = runST $ do
mutableVector <- thaw vector
sort mutableVector
freeze mutableVector
The thaw and freeze functions are linear-time copying, but this won't disrupt the overall O(n lg n) running time. You can even use unsafeFreeze to avoid another linear traversal, as the mutable vector isn't used again.

"Pure functional programming languages do not allow mutable data" ... actually it does, you just simply have to recognize where it lies hidden and see it for what it is.
Mutability is where two things have the same name and mutually exclusive times of existence so that they may be treated as "the same thing at different times". But as every Zen philosopher knows, there is no such thing as "same thing at different times". Everything ceases to exist in an instant and is inherited by its successor in possibly changed form, in a (possibly) uncountably-infinite succession of instants.
In the lambda calculus, mutability thus takes the form illustrated by the following example: (λx (λx f(x)) (x+1)) (x+1), which may also be rendered as "let x = x + 1 in let x = x + 1 in f(x)" or just "x = x + 1, x = x + 1, f(x)" in a more C-like notation.
In other words, "name clash" of the "lambda calculus" is actually "update" of imperative programming, in disguise. They are one and the same - in the eyes of the Zen (who is always right).
So, let's refer to each instant and state of the variable as the Zen Scope of an object. One ordinary scope with a mutable object equals many Zen Scopes with constant, unmutable objects that either get initialized if they are the first, or inherit from their predecessor if they are not.
When people say "mutability" they're misidentifying and confusing the issue. Mutability (as we've just seen here) is a complete red herring. What they actually mean (even unbeknonwst to themselves) is infinite mutability; i.e. the kind which occurs in cyclic control flow structures. In other words, what they're actually referring to - as being specifically "imperative" and not "functional" - is not mutability at all, but cyclic control flow structures along with the infinite nesting of Zen Scopes that this entails.
The key feature that lies absent in the lambda calculus is, thus, seen not as something that may be remedied by the inclusion of an overwrought and overthought "solution" like monads (though that doesn't exclude the possibility of it getting the job done) but as infinitary terms.
A control flow structure is the wrapping of an unwrapped (possibility infinite) decision tree structure. Branches may re-converge. In the corresponding unwrapped structure, they appear as replicated, but separate, branches or subtrees. Goto's are direct links to subtrees. A goto or branch that back-branches to an earlier part of a control flow structure (the very genesis of the "cycling" of a cyclic control flow structure) is a link to an identically-shaped copy of the entire structure being linked to. Corresponding to each structure is its Universally Unrolled decision tree.
More precisely, we may think of a control-flow structure as a statement that precedes an actual expression that conditions the value of that expression. The archetypical case in point is Landin's original case, itself (in his 1960's paper, where he tried to lambda-ize imperative languages): let x = 1 in f(x). The "x = 1" part is the statement, the "f(x)" is the value being conditioned by the statement. In C-like form, we could write this as x = 1, f(x).
More generally, corresponding to each statement S and expression Q is an expression S[Q] which represents the result Q after S is applied. Thus, (x = 1)[f(x)] is just λx f(x) (x + 1). The S wraps around the Q. If S contains cyclic control flow structures, the wrapping will be infinitary.
When Landin tried to work out this strategy, he hit a hard wall when he got to the while loop and went "Oops. Never mind." and fell back into what become an overwrought and overthought solution, while this simple (and in retrospect, obvious) answer eluded his notice.
A while loop "while (x < n) x = x + 1;" - which has the "infinite mutability" mentioned above, may itself be treated as an infinitary wrapper, "if (x < n) { x = x + 1; if (x < 1) { x = x + 1; if (x < 1) { x = x + 1; ... } } }". So, when it wraps around an expression Q, the result is (in C-like notation) "x < n? (x = x + 1, x < n? (x = x + 1, x < n? (x = x + 1, ...): Q): Q): Q", which may be directly rendered in lambda form as "x < n? (λx x < n (λx x < n? (λx·...) (x + 1): Q) (x + 1): Q) (x + 1): Q". This shows directly the connection between cyclicity and infinitariness.
This is an infinitary expression that, despite being infinite, has only a finite number of distinct subexpressions. Just as we can think of there being a Universally Unrolled form to this expression - which is similar to what's shown above (an infinite decision tree) - we can also think of there being a Maximally Rolled form, which could be obtained by labelling each of the distinct subexpressions and referring to the labels, instead. The key subexpressions would then be:
A: x < n? goto B: Q
B: x = x + 1, goto A
The subexpression labels, here, are "A:" and "B:", while the references to the subexpressions so labelled as "goto A" and "goto B", respectively. So, by magic, the very essence of Imperativitity emerges directly out of the infinitary lambda calculus, without any need to posit it separately or anew.
This way of viewing things applies even down to the level of binary files. Every interpretation of every byte (whether it be a part of an opcode of an instruction that starts 0, 1, 2 or more bytes back, or as part of a data structure) can be treated as being there in tandem, so that the binary file is a rolling up of a much larger universally unrolled structure whose physical byte code representation overlaps extensively with itself.
Thus, emerges the imperative programming language paradigm automatically out of the pure lambda calculus, itself, when the calculus is extended to include infinitary terms. The control flow structure is directly embodied in the very structure of the infinitary expression, itself; and thus requires no additional hacks (like Landin's or later descendants, like monads) - as it's already there.
This synthesis of the imperative and functional paradigms arose in the late 1980's via the USENET, but has not (yet) been published. Part of it was already implicit in the treatment (dating from around the same time) given to languages, like Prolog-II, and the much earlier treatment of cyclic recursive structures by infinitary expressions by Irene Guessarian LNCS 99 "Algebraic Semantics".
Now, earlier I said that the magma-based formulation might get you to the same place, or to an approximation thereof. I believe there is a kind of universal representation theorem of some sort, which asserts that the infinitary based formulation provides a purely syntactic representation, and that the semantics that arise from the monad-based representation factors through this as "monad-based semantics" = "infinitary lambda calculus" + "semantics of infinitary languages".
Likewise, we may think of the "Q" expressions above as being continuations; so there may also be a universal representation theorem for continuation semantics, which similarly rolls this formulation back into the infinitary lambda calculus.
At this point, I've said nothing about non-rational infinitary terms (i.e. infinitary terms which possess an infinite number of distinct subterms and no finite Minimal Rolling) - particularly in relation to interprocedural control flow semantics. Rational terms suffice to account for loops and branches, and so provide a platform for intraprocedural control flow semantics; but not as much so for the call-return semantics that are the essential core element of interprocedural control flow semantics, if you consider subprograms to be directly represented as embellished, glorified macros.
There may be something similar to the Chomsky hierarchy for infinitary term languages; so that type 3 corresponds to rational terms, type 2 to "algebraic terms" (those that can be rolled up into a finite set of "goto" references and "macro" definitions), and type 0 for "transcendental terms". That is, for me, an unresolved loose end, as well.

understanding referential transparency

Generally, I have a headache because something is wrong with my reasoning:
For 1 set of arguments, referential transparent function will always return 1 set of output values.
that means that such function could be represented as a truth table (a table where 1 set of output parameters is specified for 1 set of arguments).
that makes the logic behind such functions is combinational (as opposed to sequential)
that means that with pure functional language (that has only rt functions) it is possible to describe only combinational logic.
The last statement is derived from this reasoning, but it's obviously false; that means there is an error in reasoning. [question: where is error in this reasoning?]
UPD2. You, guys, are saying lots of interesting stuff, but not answering my question. I defined it more explicitly now. Sorry for messing up with question definition!

Question: where is error in this reasoning?
A referentially transparent function might require an infinite truth table to represent its behavior. You will be hard pressed to design an infinite circuit in combinatory logic.
Another error: the behavior of sequential logic can be represented purely functionally as a function from states to states. The fact that in the implementation these states occur sequentially in time does not prevent one from defining a purely referentially transparent function which describes how state evolves over time.

Edit: Although I apparently missed the bullseye on the actual question, I think my answer is pretty good, so I'm keeping it :-) (see below).
I guess a more concise way to phrase the question might be: can a purely functional language compute anything an imperative one can?
First of all, suppose you took an imperative language like C and made it so you can't alter variables after defining them. E.g.:
int i;
for (i = 0; // okay, that's one assignment
i < 10; // just looking, that's all
i++) // BUZZZ! Sorry, can't do that!
Well, there goes your for loop. Do we get to keep our while loop?
while (i < 10)
Sure, but it's not very useful. i can't change, so it's either going to run forever or not run at all.
How about recursion? Yes, you get to keep recursion, and it's still plenty useful:
int sum(int *items, unsigned int count)
{
if (count) {
// count the first item and sum the rest
return *items + sum(items + 1, count - 1);
} else {
// no items
return 0;
}
}
Now, with functions, we don't alter state, but variables can, well, vary. Once a variable passes into our function, it's locked in. However, we can call the function again (recursion), and it's like getting a brand new set of variables (the old ones stay the same). Although there are multiple instances of items and count, sum((int[]){1,2,3}, 3) will always evaluate to 6, so you can replace that expression with 6 if you like.
Can we still do anything we want? I'm not 100% sure, but I think the answer is "yes". You certainly can if you have closures, though.
You have it right. The idea is, once a variable is defined, it can't be redefined. A referentially transparent expression, given the same variables, always yields the same result value.
I recommend looking into Haskell, a purely functional language. Haskell doesn't have an "assignment" operator, strictly speaking. For instance:
my_sum numbers = ??? where
i = 0
total = 0
Here, you can't write a "for loop" that increments i and total as it goes along. All is not lost, though. Just use recursion to keep getting new is and totals:
my_sum numbers = f 0 0 where
f i total =
if i < length numbers
then f i' total'
else total
where
i' = i+1
total' = total + (numbers !! i)
(Note that this is a stupid way to sum a list in Haskell, but it demonstrates a method of coping with single assignment.)
Now, consider this highly imperative-looking code:
main = do
a <- readLn
b <- readLn
print (a + b)
It's actually syntactic sugar for:
main =
readLn >>= (\a ->
readLn >>= (\b ->
print (a + b)))
The idea is, instead of main being a function consisting of a list of statements, main is an IO action that Haskell executes, and actions are defined and chained together with bind operations. Also, an action that does nothing, yielding an arbitrary value, can be defined with the return function.
Note that bind and return aren't specific to actions. They can be used with any type that calls itself a Monad to do all sorts of funky things.
To clarify, consider readLn. readLn is an action that, if executed, would read a line from standard input and yield its parsed value. To do something with that value, we can't store it in a variable because that would violate referential transparency:
a = readLn
If this were allowed, a's value would depend on the world and would be different every time we called readLn, meaning readLn wouldn't be referentially transparent.
Instead, we bind the readLn action to a function that deals with the action, yielding a new action, like so:
readLn >>= (\x -> print (x + 1))
The result of this expression is an action value. If Haskell got off the couch and performed this action, it would read an integer, increment it, and print it. By binding the result of an action to a function that does something with the result, we get to keep referential transparency while playing around in the world of state.

As far as I understand it, referential transparency just means: A given function will always yield the same result when invoked with the same arguments. So, the mathematical functions you learned about in school are referentially transparent.
A language you could check out in order to learn how things are done in a purely functional language would be Haskell. There are ways to use "updateable storage possibilities" like the Reader Monad, and the State Monad for example. If you're interested in purely functional data structures, Okasaki might be a good read.
And yes, you're right: Order of evaluation in a purely functional language like haskell does not matter as in non-functional languages, because if there are no side effects, there is no reason to do someting before/after something else -- unless the input of one depends on the output of the other, or means like monads come into play.
I don't really know about the truth-table question.

Here's my stab at answering the question:
Any system can be described as a combinatorial function, large or small.
There's nothing wrong with the reasoning that pure functions can only deal with combinatorial logic -- it's true, just that functional languages hide that from you to some extent or another.
You could even describe, say, the workings of a game engine as a truth table or a combinatorial function.
You might have a deterministic function that takes in "the current state of the entire game" as the RAM occupied by the game engine and the keyboard input, and returns "the state of the game one frame later". The return value would be determined by the combinations of the bits in the input.
Of course, in any meaningful and sane function, the input is parsed down to blocks of integers, decimals and booleans, but the combinations of the bits in those values is still determining the output of your function.
Keep in mind also that basic digital logic can be described in truth tables. The only reason that that's not done for anything more than, say, arithmetic on 4-bit integers, is because the size of the truth table grows exponentially.

The error in Your reasoning is the following:
"that means that such function could be represented as a truth table".
You conclude that from a functional language's property of referential transparency. So far the conclusion would sound plausible, but You oversee that a function is able to accept collections as input and process them in contrast to the fixed inputs of a logic gate.
Therefore a function does not equal a logic gate but rather a construction plan of such a logic gate depending on the actual (at runtime determined) input!
To comment on Your comment: Functional languages can - although stateless - implement a state machine by constructing the states from scratch each time they are being accessed.

Pure functional bottom up tree algorithm

Say I wanted to write an algorithm working on an immutable tree data structure that has a list of leaves as its input. It needs to return a new tree with changes made to the old tree going upwards from those leaves.
My problem is that there seems to be no way to do this purely functional without reconstructing the entire tree checking at leaves if they are in the list, because you always need to return a complete new tree as the result of an operation and you can't mutate the existing tree.
Is this a basic problem in functional programming that only can be avoided by using a better suited algorithm or am I missing something?
Edit: I not only want to avoid to recreate the entire tree but also the functional algorithm should have the same time complexity than the mutating variant.

The most promising I have seen so far (which admittedly is not very long...) is the Zipper data structure: It basically keeps a separate structure, a reverse path from the node to root, and does local edits on this separate structure.
It can do multiple local edits, most of which are constant time, and write them back to the tree (reconstructing the path to root, which are the only nodes that need to change) all in one go.
The Zipper is a standard library in Clojure (see the heading Zippers - Functional Tree Editing).
And there's the original paper by Huet with an implementation in OCaml.
Disclaimer: I have been programming for a long time, but only started functional programming a couple of weeks ago, and had never even heard of the problem of functional editing of trees until last week, so there may very well be other solutions I'm unaware of.
Still, it looks like the Zipper does most of what one could wish for. If there are other alternatives at O(log n) or below, I'd like to hear them.

You may enjoy reading
http://lorgonblog.spaces.live.com/blog/cns!701679AD17B6D310!248.entry

This depends on your functional programming language. For instance in Haskell, which is a Lazy functional programming language, results are calculated at the last moment; when they are acutally needed.
In your example the assumption is that because your function creates a new tree, the whole tree must be processed, whereas in reality the function is just passed on to the next function and only executed when necessary.
A good example of lazy evaluation is the sieve of erastothenes in Haskell, which creates the prime numbers by eliminating the multiples of the current number in the list of numbers. Note that the list of numbers is infinite. Taken from here
primes :: [Integer]
primes = sieve [2..]
where
sieve (p:xs) = p : sieve [x|x <- xs, x `mod` p > 0]

I recently wrote an algorithm that does exactly what you described - https://medium.com/hibob-engineering/from-list-to-immutable-hierarchy-tree-with-scala-c9e16a63cb89
It works in 2 phases:
Sort the list of nodes by their depth in the hierarchy
constructs the tree from bottom up
Some caveats:
No Node mutation, The result is an Immutable-tree
The complexity is O(n)
Ignores cyclic referencing in the incoming list

The difference between MapReduce and the map-reduce combination in functional programming

I read the mapreduce at http://en.wikipedia.org/wiki/MapReduce ,understood the example of how to get the count of a "word" in many "documents". However I did not understand the following line:
Thus the MapReduce framework transforms a list of (key, value) pairs into a list of values. This behavior is different from the functional programming map and reduce combination, which accepts a list of arbitrary values and returns one single value that combines all the values returned by map.
Can someone elaborate on the difference again(MapReduce framework VS map and reduce combination)? Especially, what does the reduce functional programming do?
Thanks a great deal.

The main difference would be that MapReduce is apparently patentable. (Couldn't help myself, sorry...)
On a more serious note, the MapReduce paper, as I remember it, describes a methodology of performing calculations in a massively parallelised fashion. This methodology builds upon the map / reduce construct which was well known for years before, but goes beyond into such matters as distributing the data etc. Also, some constraints are imposed on the structure of data being operated upon and returned by the functions used in the map-like and reduce-like parts of the computation (the thing about data coming in lists of key/value pairs), so you could say that MapReduce is a massive-parallelism-friendly specialisation of the map & reduce combination.
As for the Wikipedia comment on the function being mapped in the functional programming's map / reduce construct producing one value per input... Well, sure it does, but here there are no constraints at all on the type of said value. In particular, it could be a complex data structure like perhaps a list of things to which you would again apply a map / reduce transformation. Going back to the "counting words" example, you could very well have a function which, for a given portion of text, produces a data structure mapping words to occurrence counts, map that over your documents (or chunks of documents, as the case may be) and reduce the results.
In fact, that's exactly what happens in this article by Phil Hagelberg. It's a fun and supremely short example of a MapReduce-word-counting-like computation implemented in Clojure with map and something equivalent to reduce (the (apply + (merge-with ...)) bit -- merge-with is implemented in terms of reduce in clojure.core). The only difference between this and the Wikipedia example is that the objects being counted are URLs instead of arbitrary words -- other than that, you've got a counting words algorithm implemented with map and reduce, MapReduce-style, right there. The reason why it might not fully qualify as being an instance of MapReduce is that there's no complex distribution of workloads involved. It's all happening on a single box... albeit on all the CPUs the box provides.
For in-depth treatment of the reduce function -- also known as fold -- see Graham Hutton's A tutorial on the universality and expressiveness of fold. It's Haskell based, but should be readable even if you don't know the language, as long as you're willing to look up a Haskell thing or two as you go... Things like ++ = list concatenation, no deep Haskell magic.

Using the word count example, the original functional map() would take a set of documents, optionally distribute subsets of that set, and for each document emit a single value representing the number of words (or a particular word's occurrences) in the document. A functional reduce() would then add up the global counts for all documents, one for each document. So you get a total count (either of all words or a particular word).
In MapReduce, the map would emit a (word, count) pair for each word in each document. A MapReduce reduce() would then add up the count of each word in each document without mixing them into a single pile. So you get a list of words paired with their counts.

MapReduce is a framework built around splitting a computation into parallelizable mappers and reducers. It builds on the familiar idiom of map and reduce - if you can structure your tasks such that they can be performed by independent mappers and reducers, then you can write it in a way which takes advantage of a MapReduce framework.
Imagine a Python interpreter which recognized tasks which could be computed independently, and farmed them out to mapper or reducer nodes. If you wrote
reduce(lambda x, y: x+y, map(int, ['1', '2', '3']))
or
sum([int(x) for x in ['1', '2', '3']])
you would be using functional map and reduce methods in a MapReduce framework. With current MapReduce frameworks, there's a lot more plumbing involved, but it's the same concept.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex