Tail calls in architectures without a call stack

Tail calls in architectures without a call stack - functional-programming

My answer for a recent question about GOTOs and tail recursion was phrased in terms of a call stack. I'm worried that it wasn't sufficiently general, so I ask you: how is the notion of a tail call (or equivalent) useful in architectures without a call stack?
In continuation passing, all called functions replace the calling function, and are thus tail calls, so "tail call" doesn't seem to be a useful distinction. In messaging and event based architectures, there doesn't seem to be an equivalent, though please correct me if I'm wrong. The latter two architectures are interesting cases as they are associated with OOP, rather than FP. What about other architectures? Were the old Lisp machines based on call-stacks?
Edit: According to "What the heck is: Continuation Passing Style (CPS)" (and Alex below), the equivalent of a tail call under continuation passing is not "called function replaces calling function" but "calling function passes the continuation it was given, rather than creating a new continuation". This type of tail call is useful, unlike what I asserted.
Also, I'm not interested in systems that use call stacks at a lower level, for the higher level doesn't require a call stack. This restriction doesn't apply to Alex's answer because he's writing about the fact that other invocation architectures (is this the right term?) often have a call stack equivalent, not that they have a call stack somewhere under the hood. In the case of continuation passing, the structure is like an arborescence, but with edges in the opposite direction. Call stack equivalents are highly relevant to my interests.

"Architectures without a call stack" typically "simulate" one at some level -- for example, back in the time of IBM 360, we used the S-Type Linkage Convention using register-save areas and arguments-lists indicated, by convention, by certain general-purpose registers.
So "tail call" can still matter: does the calling function need to preserve the information necessary to resume execution after the calling point (once the called function is finished), or does it know there IS going to be no execution after the calling point, and so simply reuse its caller's "info to resume execution" instead?
So for example a tail call optimization might mean not appending the continuation needed to resume execution on whatever linked list is being used for the purpose... which I like to see as a "call stack simulation" (at some level, though it IS obviously a more flexible arrangement -- don't want to have continuation-passing fans jumping all over my answer;-).

On the off chance that this question interests someone other than me, I have an expanded answer for the other question that also answers this one. Here's the nutshell, non-rigorous version.
When a computational system performs sub-computations (i.e. a computation starts and must pause while another computation is performed because the first depends on the result of the second), a dependency relation between execution points naturally arises. In call-stack based architectures, the relation is topologically a path graph. In CPS, it's a tree, where every path between the root and a node is a continuation. In message passing and threading, it's a collection of path graphs. Synchronous event handling is basically message passing. Starting a sub-calculation involves extending the dependency relation, except in a tail call which replaces a leaf rather than appending to it.
Translating tail calling to asynchronous event handling is more complex, so instead consider a more general version. If A is subscribed to an event on channel 1, B is subscribed to the same event on channel 2 and B's handler merely fires the event on channel 1 (it translates the event across channels), then A can be subscribed to the event on channel 2 instead of subscribing B. This is more general because the equivalent of a tail call requires that
A's subscription on channel 1 be canceled when A is subscribed on channel 2
the handlers are self-unsubscribing (when invoked, they cancel the subscription)
Now for two systems that don't perform sub-computations: lambda calculus (or term rewriting systems in general) and RPN. For lambda calculus, tail calls roughly correspond to a sequence of reductions where the term length is O(1) (see iterative processes in SICP section 1.2). Take RPN to use a data stack and an operations stack (as opposed to a stream of operations; the operations are those yet to be processed), and an environment that maps symbols to a sequence of operations. Tail calls could correspond to processes with O(1) stack growth.

Related

Understanding (sub,partial,full,one-shot) continuations (in procedural languages)

After reading almost everything I could find about continuations, I still have trouble understanding them. Maybe because all of the explanations are heavily tied with lambda calculus, which I have trouble understanding.
In general, a continuation is some representation of what to do next, after you have finished doing the current thing, i.e. the rest of the computation.
But then, it gets trickier with all the variety. Maybe some of you could help me out with my custom analogy here and point out where I have made mistakes in my understanding.
Let's say our functions are represented as objects, and for the sake of simplicity:
Our interpreter has a stack of function calls.
Each function call has a stack for local data and arguments.
Each function call has a queue of "instructions" to be executed that operate on the local data stack and the queue itself (and, maybe on the stacks of callers too).
The analogy is meant to be similar to XY concatenative language.
So, in my understanding:
A continuation is the rest of the whole computation (this unfinished queue of instructions + stack of all subsequent computations: queues of callers).
A partial continuation is the current unfinished queue + some delimited part of the caller stack, up until some point (not the full one, for the whole program).
A sub-continuation is the rest of the current instruction queue for currently "active" function.
A one-shot continuation is such a continuation that can be executed only once,
after being reified into an object.
Please correct me if I am wrong in my analogies.

A continuation is the rest of the whole computation (this unfinished queue of instructions + stack of all subsequent computations: queues of callers)
Informally, your understanding is correct. However, continuation as a concept is an abstraction of the control state and the state the process should attain if the continuation is invoked. It needs not contain the entire stack explicitly, as long as the state can be attained. This is one of the classical definitions of continuation. Refer to Rhino JavaScript doc.
A partial continuation is the current unfinished queue + some delimited part of the caller stack, up until some point (not the full one, for the whole program)
Again, informally, your understanding is correct. It is also called delimited continuation or composable continuation. In this case, not only the process needs to attain the state represented by the delimited continuation, but it also needs to limit the invocation of the continuation to the limit specified. This is in contrast to full continuation, which starts from the point specified and continues to the end of the control flow. Delimited continuation starts from this point and ends in another identified point only. It is a partial control flow and not the full one, which is precisely why delimited continuation needs to return a value (see the Wikipedia article). The value usually represents the result of the partial execution.
A sub-continuation is the rest of the current instruction queue for currently "active" function
Your understanding here is slightly hazy. Once way to understand this is to consider a multi-threaded environment. If there is a main thread and a new thread is started from it at some point, then a continuation (from the context of main or child thread), what should it represent? The entire control flow of child and main threads from that point (which in most cases does not make much sense) or just the control flow of the child thread? Sub-continuation is a kind of delimited continuation, the control flow state representation from a point to another point that is part of the sub-process or a sub-process tree. Refer to this paper.
A one-shot continuation is such a continuation that can be executed only once, after being reified into an object
As per this paper, your understanding is correct. The paper seems not to explicitly state if this is a full/classical continuation or a delimited one; however, in next sections, the paper states that full continuations are problematic, so I would assume this type of continuation is a delimited continuation.

Does 'foldp' violate FP's no mutable state principle?

I'm learning about Elm from Seven More Languages in Seven Weeks. The following example confuses me:
import Keyboard
main = lift asText (foldp (\dir presses -> presses + dir.x) 0 Keyboard.arrows)
foldp is defined as:
Signal.foldp : (a -> b -> b) -> b -> Signal a -> Signal b
It appears to me that:
the initial value of the accumulator presses is only 0 on the first evaluation of main
after the first evaluation of main it seems that the initial value of presses is whatever the result of function (a -> b -> b), or (\dir presses -> presses + dir.x) in the example, was on the previous evaluation.
If this is indeed the case, then isn't this a violation of functional programming principles, since main now maintains internal state (or at least foldp does)?
How does this work when I use foldp in multiple places in my code? Does it keep multiple internal states, one for each time I use it?
The only other alternative I see is that foldp (in the example) starts counting from 0, so to say, each time it's evaluated, and somehow folds up the entire history provided by Keyboard.arrows. This seems to me to be extremely wasteful and sure to cause out-of-memory exceptions for long run times.
Am I missing something here?

How it works
Yes, foldp keeps some internal state around. Saving the entire history would be wasteful and is not done.
If you use foldp multiple times in your code, doing distinct things or having distinct input signals, then each instance will keep it's own local state. Example:
import Keyboard
plus = (foldp (\dir presses -> presses + dir.x) 0 Keyboard.arrows)
minus = (foldp (\dir presses -> presses - dir.x) 0 Keyboard.arrows)
showThem p m = flow down (map asText [p, m])
main = lift2 showThem plus minus
But if you use the resulting signal from a foldp twice, only one foldp instance will be in your compiled program, the resulting changes will just be used in two place:
import Keyboard
plus = (foldp (\dir presses -> presses + dir.x) 0 Keyboard.arrows)
showThem p m = flow down (map asText [p, m])
main = lift2 showThem plus plus
The main question
If this is indeed the case, then isn't this a violation of functional programming principles, since main now maintains internal state (or at least foldp does)?
Functional programming doesn't have some great canonical definition that everybody uses. There are many examples of functional programming languages that allow for the use of mutable state. Some of these programming languages show you that a value is mutable in the type-system (you could see Haskell's State a type as such, it really depends on your viewpoint though).
But what is mutable state? What is a mutable value? It's a value inside the program, that is mutable. That is, it can change. It can be different things at different times. Ah, but we know how Elm calls values at change over time! That's a Signal.
So really a Signal in Elm is a value that can change over time, and can therefore be seen as a variable, a mutable value, or mutable state. It's just that we manage this value very strictly by allowing only a few well-chosen manipulations on Signals. Such a Signal can be based on other Signals in your program, or come from a library or come from the outside world (think of inputs like Mouse.position). And who knows how the outside world came up with that signal! So allowing your own Signals to be based on the past value of Signals is actually ok.
Conclusion / TL;DR
You could see Signal as a safety wrapper around mutable state. We assume that signals that come from the outside world (as input to your program) are not predictable, but because we have this safety wrapper that only allows lift/sample/filter/foldp, the program you write is otherwise completely predictable. Side-effects are contained and managed, therefore I think it's still "functional programming".

You're confusing an implementation detail with a conceptual detail. Every functional programming language eventually gets translated down to assembly code, which is decidedly imperative. That doesn't mean you can't have purity at the language level.
Don't think of main as being repeatedly evaluated, returning different results every time. A Signal is conceptually an infinite list of values. main takes an infinite list of keyboard arrows as input and translates that into an infinite list of elements. Given the same list of arrows, it will always return the exact same list of elements, without side effects. At this level of abstraction, it is therefore a pure function.
Now, it so happens that we are only interested in the last element of the sequence. This allows for some optimizations in the implementation, one of which is storing the accumulated value. What's important is that the implementation is referentially transparent. From the language's point of view, you're getting the exact same answer as if you stored the entire sequence and recomputed it from scratch every time a value is added to the end. You get the same output given the same input. The only difference is storage space and execution time.
In other words, the whole idea of functional programming is not to eliminate state tracking, but to abstract it away from the purview of the programmer. Programmers get to play in the ideal world, while the compiler and runtime slave away in the sewers of mutable state to make the ideal world possible for the rest of us.

You should note that "doesnt maintain internal state" isn't really strong definition of FP. Its much more like an implementation constraint. What definition I like more is "built from pure functions". Without diving deep, in plain English it means that all functions return same output when given same input. This definition unlike previous gives you huge reasoning power and a simple way to check whether some program follows it while keeping some optimization space on current hardware.
Given reformulated restriction functional languages are free to use mutables as long as it modelled with pure functions. Answering your question, elm programs built out of pure functions so its probably a functional language. Elm uses special data structure, Signal, to model outside world interactions and internal state as well as any other functional language does.

Why are Asynchronous processes not called Synchronous?

So I'm a little confused by this terminology.
Everyone refers to "Asynchronous" computing as running different processes on seperate threads, which gives the illusion that these processes are running at the same time.
This is not the definition of the word asynchronous.
a⋅syn⋅chro⋅nous
–adjective
1. not occurring at the same time.
2. (of a computer or other electrical machine) having each operation started only after the preceding operation is completed.
What am I not understanding here?

It means that the two threads are not running in sync, that is, they are not both running on the same timeline.
I think it's a case of computer scientists being too clever about their use of words.
Synchronisation, in this context, would suggest that both threads start and end at the same time. Asynchrony in this sense, means both threads are free to start, execute and end as they require.

The word "synchronous" implies that a function call will be synchronized with some other event.
Asynchronous implies that no such synchronization occurs.
It seems like the definition that you have there should really be the definition for "concurrent," or something. That definition looks wrong.
PS:
Here is the wiktionary definition:
asynchronous
Not synchronous; occurring at different times.
(computing, of a request or a message) allowing the client to continue during processing.
Which just so happens to be the exact opposite of what you posted.

I believe that the term was first used for synchronous vs. asynchronous communication. There synchronous means that the two communicating parts have a common clock signal that they run by, so they run in parallel. Asynchronous communication instead has a ready signal, so one part asks for data and gets a signal back when it's available.
The terms was then adapted to processes, but as there are obvious differences some aspects of the terms work differently. For a single thread process the natural way to request for something to be done is to make a synchronous call that transfers control to the subprocess, and then control is returned when it's done, and the process continues.
An asynchronous call works just like asynchronous communication in the aspect that you send a request for something to be done, and the process doing it returns a signal when it's done. The difference in the usage of the terms is that for processes it's in the asynchronous processing that the processes runs in parallel, while for communication it is the synchronous communication that run in parallel.
So "computer or electrical machine" is really a too wide scope for making a correct definition of the term, as it's used in slightly different ways for different techniques.

I would guess it's because they are not synchronized ;)
In other words... if one process gets stopped, killed, or is waiting for something, the other will carry on

I think there's a slant that is slightly different to most of the answers here.
Asynchronous means "not happening at the same time".
In the specific case of threading:
Synchronous means "execute this code now".
Asynchronous means "enqueue this work on a different thread that will be executed at some indeterminate time in the future"
This usually allows you to "do two things at once" because of reasons like:
one thread is just waiting (e.g. for data to arrive on a serial port) so is asleep
You have multiple processors, so the two threads can run concurrently.
However, even with 128 processor cores, the case is the same: the work will be executed "at some time in the future" (if perhaps the very near future) rather than "now".

Your second definition is more helpful here:
2. [...] having each operation started only after the preceding operation is completed.
When you make an asynchronous call, that call might not be completed before the next operation is started. When the call is synchronous, it will be.

It really means that an asynchronous event is happening independently of other events whereas a synchronous event would be happening dependent of other events.

It's like: Flammable, Inflammable ( which mean the same thing )
Seriously -- it's just one of those quirks of the English language. It doesn't really make sense. You can try to explain it, but it would be just as easy to justify the reverse meanings.

Many of the answers here are not correct. IN-dependently has a beginning particle that says NOT dependently, just like A-synchronous, but the meaning of dependent and synchronous are not the same! :D
So three dependent persons would wait for an order, because they are dependent to the order, but they wait, so they are not synchronous.
In english and any other language with common roots with a, syn and chrono (italian: asincrono; spanish: asincrónico; french:
asynchrone; greek: a= not syn=together chronos=time)it means exactly the opposite.
The terminology is UTTERLY counter-intiutive. Async functions ARE synchronous, they happen at the same time, and that's their power. They DO NOT wait, they DO NOT depend, they DO NOT hold the user waiting, but all those NOTs refer to anything but synchronicity :)
The only answer possibly right is the CLOCK one, although it is still confusing. My personal interpretation is this story:
"A professor has an office, and he makes SYNCHRONOUS CALLS for students to come. He says out loud in the main university hall: 'Hey guys who wants to talk to me should come at 10 in the morning tomorrow.', or simply puts a sign saying the same stuff.
RESULT: at 10 in the morning you see a long queue. People had the same time so they came in in the same moment and they got "piled up in the process".
So the professor thinks it would be nice for students not to waste time in the queue (and do synchronous operations, that is, do parallel stuff in their lives at the same time, and that's where the confusion comes).
He decides students can substitute him in making ASYNCHRONOUS CALLS, that is, every time a student ends talking with him, the students may, e.g., call another student saying the professor is free to talk, in a room where students may do whatever they like in the meantime. So every student does not have a single SYNCHRONOUS CALL (10 in the morning, the same time for all) but they have 10, 10.10, 10.18, 10.27.. etc. according to the needed time for each discussion in the professor office."
Is that the meaning of having the same clock, #Guffa?

Is Erlang's recursive functions not just a goto?

Just to get it straight in my head. Consider this example bit of Erlang code:
test() ->
receive
{From, whatever} ->
%% do something
test();
{From, somethingelse} ->
%% do something else
test();
end.
Isn't the test() call, just a goto?
I ask this because in C we learned, if you do a function call, the return location is always put on the stack. I can't imagine this must be the case in Erlang here since this would result in a stackoverflow.
We had 2 different ways of calling functions:
goto and gosub.
goto just steered the program flow somewhere else, and gosub remembered where you came from so you could return.
Given this way of thinking, I can look at Erlang's recursion easier, since if I just read: test() as a goto, then there is no problem at all.
hence my question: isn't :Erlang just using a goto instead of remembering the return address on a stack?
EDIT:
Just to clarify my point:
I know goto's can be used in some languages to jump all over the place. But just supose instead of doing someFunction() you can also do: goto someFunction()
in the first example the flow returns, in the second example the flow just continues in someFunction and never returns.
So we limit the normal GOTO behaviour by just being able to jump to method starting points.
If you see it like this, than the Erlang recursive function call looks like a goto.
(a goto in my opinion is a function call without the ability to return where you came from). Which is exactly what is happening in the Erlang example.

A tail recursive call is more of a "return and immediately call this other function" than a goto because of the housekeeping that's performed.
Addressing your newest points: recording the return point is just one bit of housekeeping that's performed when a function is called. The return point is stored in the stack frame, the rest of which must be allocated and initialized (in a normal call), including the local variables and parameters. With tail recursion, a new frame doesn't need to be allocated and the return point doesn't need to be stored (the previous value works fine), but the rest of the housekeeping needs to be performed.
There's also housekeeping that needs to be performed when a function returns, which includes discarding locals and parameters (as part of the stack frame) and returning to the call point. During tail recursive call, the locals for the current function are discarded before invoking the new function, but no return happens.
Rather like how threads allow for lighter-weight context switching than processes, tail calls allow for lighter-weight function invocation, since some of the housekeeping can be skipped.
The "goto &NAME" statement in Perl is closer to what you're thinking of, but not quite, as it discards locals. Parameters are kept around for the newly invoked function.
One more, simple difference: a tail call can only jump to a function entry point, while a goto can jump most anywhere (some languages restrict the target of a goto, such as C, where goto can't jump outside a function).

You are correct, the Erlang compiler will detect that it is a tail recursive call, and instead of moving on on the stack, it reuses the current function's stack space.
Furthermore it is also able to detect circular tail-recursion, e.g.
test() -> ..., test2().
test2() -> ..., test3().
test3() -> ..., test().
will also be optimized.
The "unfortunate" side-effect of this is that when you are tracing function calls, you will not be able to see each invocation of a tail recursive function, but the entry and exit point.

You've got two questions here.
First, no, you're not in any danger of overrunning the stack in this case because these calls to test() are both tail-recursive.
Second, no, function calls are not goto, they're function calls. :) The thing that makes goto problematic is that it bypasses any structure in your code. You can jump out of statements, jump into statements, bypass assignments...all kinds of screwiness. Function calls don't have this problem because they have an obvious flow of control.

I think the difference here is between a "real" goto and what can in some cases seem like a goto. In some special cases the compiler can detect that it is free to cleanup the stack of the current function before calling another function. This is when the call is the last call in a function. The difference is, of course, that as in any other call you can pass arguments to the new function.
As others have pointed out this optimisation is not restricted to recursive calls but to all last calls. This is used in the "classic" way of programming FSMs.

It's a goto in the same why that if is goto and while is goto. It is implemented using (the moral equivalent of) goto, but it does not expose the full shoot-self-in-foot potential of goto directly to the programmer.

In fact, these recursive functions are the ultimate GOTO according to Guy Steele.

Here's a more general answer, which supercedes my earlier answer based on call-stacks. Since the earlier answer has been accepted, I won't replace the text.
Prologue
Some architectures don't have things they call "functions" that are "called", but do have something analogous (messaging may call them "methods" or "message handlers"; event based architectures have "event handlers" or simply "handlers"). I'll be using the terms "code block" and "invocation" for the general case, though (strictly speaking) "code block" can include things that aren't quite functions. You can substitute the appropriately inflected form of "call" for "invocation" or "invoke", as I might in a few places. The features of an architecture that describe invocation are sometimes called "styles", as in "continuation passing style" (CPS), though this isn't previously an official term. To keep things from being too abstract, we'll examine call stack, continuation passing, messaging (à la OOP) and event handling invocation styles. I should specify the models I'm using for these styles, but I'm leaving them out in the interest of space.
Invocation Features
or, C Is For Continuation, Coordination and Context, That's Good Enough For Me
Hohpe identifies three nicely alliterative invocation features of the call-stack style: Continuation, Coordination, Context (all capitalized to distinguish them from other uses of the words).
Continuation decides where execution will continue when a code block finishes. The "Continuation" feature is related to "first-class continuations" (often simply called "continuations", including by me), in that continuations make the Continuation feature visible and manipulable at a programmatic level.
Coordination means code doesn't execute until the data it needs is ready. Within a single call stack, you get Coordination for free because the program counter won't return to a function until a called function finishes. Coordination becomes an issue in (e.g.) concurrent and event-driven programming, the former because a data producer may fall behind a data consumer and the latter because when a handler fires an event, the handler continues immediately without waiting for a response.
Context refers to the environment that is used to resolve names in a code block. It includes allocation and initialization of the local variables, parameters and return value(s). Parameter passing is also covered by the calling convention (keeping up the alliteration); for the general case, you could split Context into a feature that covers locals, one that covers parameters and another for return values. For CPS, return values are covered by parameter passing.
The three features aren't necessarily independent; invocation style determines their interrelationships. For instance, Coordination is tied to Continuation under the call-stack style. Continuation and Context are connected in general, since return values are involved in Continuation.
Hohpe's list isn't necessarily exhaustive, but it will suffice to distinguish tail-calls from gotos. Warning: I might go off on tangents, such as exploring invocation space based on Hohpe's features, but I'll try to contain myself.
Invocation Feature Tasks
Each invocation feature involves tasks to be completed when invoking a code block. For Continuation, invoked code blocks are naturally related by a chain of invoking code. When a code block is invoked, the current invocation chain (or "call chain") is extended by placing a reference (an "invocation reference") to the invoking code at the end of the chain (this process is described more concretely below). Taking into account invocation also involves binding names to code blocks and parameters, we see even non-bondage-and-discipline languages can have the same fun.
Tail Calls
or, The Answer
or, The Rest Is Basically Unnecessary
Tail calling is all about optimizing Continuation, and it's a matter of recognizing when the main Continuation task (recording an invocation reference) can be skipped. The other feature tasks stand on their own. A "goto" represents optimizing away tasks for Continuation and Context. That's pretty much why a tail call isn't a simple "goto". What follows will flesh out what tail calls look like in various invocation styles.
Tail Calls In Specific Invocation Styles
Different styles arrange invocation chains in different structures, which I'll call a "tangle", for lack of a better word. Isn't it nice that we've gotten away from spaghetti code?
With a call-stack, there's only one invocation chain in the tangle; extending the chain means pushing the program counter. A tail call means no program counter push.
Under CPS, the tangle consists of the extant continuations, which form a reverse arborescence (a directed tree where every edge points towards a central node), where each path back to the center is a invocation chain (note: if the program entry point is passed a "null" continuation, the tangle can be a whole forest of reverse arborescences). One particular chain is the default, which is where an invocation reference is added during invocation. Tail calls won't add an invocation reference to the default invocation chain. Note that "invocation chain" here is basically synonymous with "continuation", in the sense of "first class continuation".
Under message passing, the invocation chain is a chain of blocked methods, each waiting for a response from the method before it in the chain. A method that invokes another is a "client"; the invoked method is a "supplier" (I'm purposefully not using "service", though "supplier" isn't much better). A messaging tangle is a set of unconnected invocation chains. This tangle structure is rather like having multiple thread or process stacks. When the method merely echos another method's response as its own, the method can have its client wait on its supplier rather than itself. Note that this gives a slightly more general optimization, one that involves optimizing Coordination as well as Continuation. If the final portion of a method doesn't depend on a response (and the response doesn't depend on the data processed in the final portion), the method can continue once it's passed on its client's wait dependency to its supplier. This is analogous to launching a new thread, where the final portion of the method becomes the thread's main function, followed by a call-stack style tail call.
What About Event Handling Style?
With event handling, invocations don't have responses and handlers don't wait, so "invocation chains" (as used above) isn't a useful concept. Instead of a tangle, you have priority queues of events, which are owned by channels, and subscriptions, which are lists of listener-handler pairs. In some event driven architectures, channels are properties of listeners; every listener owns exactly one channel, so channels become synonymous with listeners. Invoking means firing an event on a channel, which invokes all subscribed listener-handlers; parameters are passed as properties of the event. Code that would depend on a response in another style becomes a separate handler under event handling, with an associated event. A tail call would be a handler that fires the event on another channel and does nothing else afterwards. Tail call optimization would involve re-subscribing listeners for the event from the second channel to the first, or possibly having the handler that fired the event on the first channel instead fire on the second channel (an optimization made by the programmer, not the compiler/interpreter). Here's what the former optimization looks like, starting with the un-optimized version.
Listener Alice subscribes to event "inauguration" on BBC News, using handler "party"
Alice fires event "election" on channel BBC News
Bob is listening for "election" on BBC News, so Bob's "openPolls" handler is invoked
Bob subscribes to event "inauguration" on channel CNN.
Bob fires event "voting" on channel CNN
Other events are fired & handled. Eventually, one of them ("win", for example) fires event "inauguration" on CNN.
Bob's barred handler fires "inauguration" on BBC News
Alice's inauguration handler is invoked.
And the optimized version:
Listener Alice subscribes to event "inauguration" on BBC News
Alice fires event "election" on channel BBC News
Bob is listening for "election" on BBC News, so Bob's "openPolls" handler is invoked
Bob subscribes anyone listening for "inauguration" on BBC News to the inauguration event on CNN*.
Bob fires event "voting" on channel CNN
Other events are fired & handled. Eventually, one of them fires event "inauguration" on CNN.
Alice's inauguration handler is invoked for the inauguration event on CNN.
Note tail calls are trickier (untenable?) under event handling because they have to take into account subscriptions. If Alice were later to unsubscribe from "inauguration" on BBC News, the subscription to inauguration on CNN would also need to be canceled. Additionally, the system must ensure it doesn't inappropriately invoke a handler multiple times for a listener. In the above optimized example, what if there's another handler for "inauguration" on CNN that fires "inauguration" on BBC News? Alice's "party" event will be fired twice, which may get her in trouble at work. One solution is to have *Bob unsubscribe all listeners from "inauguration" on BBC News in step 4, but then you introduce another bug wherein Alice will miss inauguration events that don't come via CNN. Maybe she wants to celebrate both the U.S. and British inaugurations. These problems arise because there are distinctions I'm not making in the model, possibly based on types of subscriptions. For instance, maybe there's a special kind of one-shot subscription (like System-V signal handlers) or some handlers unsubscribe themselves, and tail call optimization is only applied in these cases.
What's next?
You could go on to more fully specify invocation feature tasks. From there, you could figure out what optimizations are possible, and when they can be used. Perhaps other invocation features could be identified. You could also think of more examples of invocation styles. You could also explore the dependencies between invocation features. For instance, synchronous and asynchronous invocation involve explicitly coupling or uncoupling Continuation and Coordination. It never ends.
Get all that? I'm still trying to digest it myself.
References:
Hohpe, Gregor; "Event-Driven Architecture"
Sugalski, Dan; "CPS and tail calls--two great tastes that taste great together"

In this case it is possible to do tail-call optimization, since we don't need to do more work or make use of local variables. So the compiler will convert this into a loop.

(a goto in my opinion is a function call without the ability to return where you came from). Which is exactly what is happening in the erlang example.
That is not what's happening in Erlang, you CAN return to where you came from.
The calls are tail-recursive, which means that it is "sort of" a goto. Make sure you understand what tail-recursion is before you attempt to understand or write any code. Reading Joe Armstrong's book probably isn't a bad idea if you are new to Erlang.
Conceptually, in the case where you call yourself using test() then a call is made to the start of the function using whatever parameters you pass (none in this example) but nothing more is added to the stack. So all your variables are thrown away and the function starts fresh, but you didn't push a new return pointer onto the stack. So it's like a hybrid between a goto and a traditional imperative language style function call like you'd have in C or Java. But there IS still one entry on the stack from the very first call from the calling function. So when you eventually exit by returning a value rather the doing another test() then that return location is popped from the stack and execution resumes in your calling function.

I just don't get continuations!

What are they and what are they good for?
I do not have a CS degree and my background is VB6 -> ASP -> ASP.NET/C#. Can anyone explain it in a clear and concise manner?

Imagine if every single line in your program was a separate function. Each accepts, as a parameter, the next line/function to execute.
Using this model, you can "pause" execution at any line and continue it later. You can also do inventive things like temporarily hop up the execution stack to retrieve a value, or save the current execution state to a database to retrieve later.

You probably understand them better than you think you did.
Exceptions are an example of "upward-only" continuations. They allow code deep down the stack to call up to an exception handler to indicate a problem.
Python example:
try:
broken_function()
except SomeException:
# jump to here
pass
def broken_function():
raise SomeException() # go back up the stack
# stuff that won't be evaluated
Generators are examples of "downward-only" continuations. They allow code to reenter a loop, for example, to create new values.
Python example:
def sequence_generator(i=1):
while True:
yield i # "return" this value, and come back here for the next
i = i + 1
g = sequence_generator()
while True:
print g.next()
In both cases, these had to be added to the language specifically whereas in a language with continuations, the programmer can create these things where they're not available.

A heads up, this example is not concise nor exceptionally clear. This is a demonstration of a powerful application of continuations. As a VB/ASP/C# programmer, you may not be familiar with the concept of a system stack or saving state, so the goal of this answer is a demonstration and not an explanation.
Continuations are extremely versatile and are a way to save execution state and resume it later. Here is a small example of a cooperative multithreading environment using continuations in Scheme:
(Assume that the operations enqueue and dequeue work as expected on a global queue not defined here)
(define (fork)
(display "forking\n")
(call-with-current-continuation
(lambda (cc)
(enqueue (lambda ()
(cc #f)))
(cc #t))))
(define (context-switch)
(display "context switching\n")
(call-with-current-continuation
(lambda (cc)
(enqueue
(lambda ()
(cc 'nothing)))
((dequeue)))))
(define (end-process)
(display "ending process\n")
(let ((proc (dequeue)))
(if (eq? proc 'queue-empty)
(display "all processes terminated\n")
(proc))))
This provides three verbs that a function can use - fork, context-switch, and end-process. The fork operation forks the thread and returns #t in one instance and #f in another. The context-switch operation switches between threads, and end-process terminates a thread.
Here's an example of their use:
(define (test-cs)
(display "entering test\n")
(cond
((fork) (cond
((fork) (display "process 1\n")
(context-switch)
(display "process 1 again\n"))
(else (display "process 2\n")
(end-process)
(display "you shouldn't see this (2)"))))
(else (cond ((fork) (display "process 3\n")
(display "process 3 again\n")
(context-switch))
(else (display "process 4\n")))))
(context-switch)
(display "ending process\n")
(end-process)
(display "process ended (should only see this once)\n"))
The output should be
entering test
forking
forking
process 1
context switching
forking
process 3
process 3 again
context switching
process 2
ending process
process 1 again
context switching
process 4
context switching
context switching
ending process
ending process
ending process
ending process
ending process
ending process
all processes terminated
process ended (should only see this once)
Those that have studied forking and threading in a class are often given examples similar to this. The purpose of this post is to demonstrate that with continuations you can achieve similar results within a single thread by saving and restoring its state - its continuation - manually.
P.S. - I think I remember something similar to this in On Lisp, so if you'd like to see professional code you should check the book out.

One way to think of a continuation is as a processor stack. When you "call-with-current-continuation c" it calls your function "c", and the parameter passed to "c" is your current stack with all your automatic variables on it (represented as yet another function, call it "k"). Meanwhile the processor starts off creating a new stack. When you call "k" it executes a "return from subroutine" (RTS) instruction on the original stack, jumping you back in to the context of the original "call-with-current-continuation" ("call-cc" from now on) and allowing your program to continue as before. If you passed a parameter to "k" then this becomes the return value of the "call-cc".
From the point of view of your original stack, the "call-cc" looks like a normal function call. From the point of view of "c", your original stack looks like a function that never returns.
There is an old joke about a mathematician who captured a lion in a cage by climbing into the cage, locking it, and declaring himself to be outside the cage while everything else (including the lion) was inside it. Continuations are a bit like the cage, and "c" is a bit like the mathematician. Your main program thinks that "c" is inside it, while "c" believes that your main program is inside "k".
You can create arbitrary flow-of-control structures using continuations. For instance you can create a threading library. "yield" uses "call-cc" to put the current continuation on a queue and then jumps in to the one on the head of the queue. A semaphore also has its own queue of suspended continuations, and a thread is rescheduled by taking it off the semaphore queue and putting it on to the main queue.

Basically, a continuation is the ability for a function to stop execution and then pick back up where it left off at a later point in time. In C#, you can do this using the yield keyword. I can go into more detail if you wish, but you wanted a concise explanation. ;-)

I'm still getting "used" to continuations, but one way to think about them that I find useful is as abstractions of the Program Counter (PC) concept. A PC "points" to the next instruction to execute in memory, but of course that instruction (and pretty much every instruction) points, implicitly or explicitly, to the instruction that follows, as well as to whatever instructions should service interrupts. (Even a NOOP instruction implicitly does a JUMP to the next instruction in memory. But if an interrupt occurs, that'll usually involve a JUMP to some other instruction in memory.)
Each potentially "live" point in a program in memory to which control might jump at any given point is, in a sense, an active continuation. Other points that can be reached are potentially active continuations, but, more to the point, they are continuations that are potentially "calculated" (dynamically, perhaps) as a result of reaching one or more of the currently active continuations.
This seems a bit out of place in traditional introductions to continuations, in which all pending threads of execution are explicitly represented as continuations into static code; but it takes into account the fact that, on general-purpose computers, the PC points to an instruction sequence that might potentially change the contents of memory representing a portion of that instruction sequence, thus essentially creating a new (or modified, if you will) continuation on the fly, one that doesn't really exist as of the activations of continuations preceding that creation/modification.
So continuation can be viewed as a high-level model of the PC, which is why it conceptually subsumes ordinary procedure call/return (just as ancient iron did procedure call/return via low-level JUMP, aka GOTO, instructions plus recording of the PC on call and restoring of it on return) as well as exceptions, threads, coroutines, etc.
So just as the PC points to computations to happen in "the future", a continuation does the same thing, but at a higher, more-abstract level. The PC implicitly refers to memory plus all the memory locations and registers "bound" to whatever values, while a continuation represents the future via the language-appropriate abstractions.
Of course, while there might typically be just one PC per computer (core processor), there are in fact many "active" PC-ish entities, as alluded to above. The interrupt vector contains a bunch, the stack a bunch more, certain registers might contain some, etc. They are "activated" when their values are loaded into the hardware PC, but continuations are abstractions of the concept, not PCs or their precise equivalent (there's no inherent concept of a "master" continuation, though we often think and code in those terms to keep things fairly simple).
In essence, a continuation is a representation of "what to do next when invoked", and, as such, can be (and, in some languages and in continuation-passing-style programs, often is) a first-class object that is instantiated, passed around, and discarded just like most any other data type, and much like how a classic computer treats memory locations vis-a-vis the PC -- as nearly interchangeable with ordinary integers.

In C#, you have access to two continuations. One, accessed through return, lets a method continue from where it was called. The other, accessed through throw, lets a method continue at the nearest matching catch.
Some languages let you treat these statements as first-class values, so you can assign them and pass them around in variables. What this means is that you can stash the value of return or of throw and call them later when you're really ready to return or throw.
Continuation callback = return;
callMeLater(callback);
This can be handy in lots of situations. One example is like the one above, where you want to pause the work you're doing and resume it later when something happens (like getting a web request, or something).
I'm using them in a couple of projects I'm working on. In one, I'm using them so I can suspend the program while I'm waiting for IO over the network, then resume it later. In the other, I'm writing a programming language where I give the user access to continuations-as-values so they can write return and throw for themselves - or any other control flow, like while loops - without me needing to do it for them.

Think of threads. A thread can be run, and you can get the result of its computation. A continuation is a thread that you can copy, so you can run the same computation twice.

Continuations have had renewed interest with web programming because they nicely mirror the pause/resume character of web requests. A server can construct a continaution representing a users session and resume if and when the user continues the session.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex