Kotlin: Any performance impact on converting a "normal" function to a blocking suspend function?

Kotlin: Any performance impact on converting a "normal" function to a blocking suspend function? - asynchronous

I have a function that looks like this:
fun <R> map(block: (T) -> R): Result<R> { ... }
and I'd like to make a suspending version:
suspend fun <R> mapAsync(block: suspend (T) -> R): Result<R> { ... }
The logic in both bodies are identical, but one suspends and one doesn't.
I don't want to have this duplicated logic. The only way I found for this to work is to have the map function call to the mapAsync function and then wrap the result in runBlocking:
fun <R> map(block: (T) -> R): Result<R> =
runBlocking { mapAsync { block(it) } }
So I have two questions:
Is there any performance considerations in taking a "normal" function, passing it as a suspend parameter, then block until the result is done?
Based on what I've read, it sounds like the initial thread keeps "doing the work" inside the suspend block until it hits the first suspend point. Then, the continuation is put into the wait queue and the initial thread is free to perform other work.
However, in this case, there isn't any "real" suspend point because the actual function is just (T) -> R, though I don't know if the compiler can tell that.
I'm worried that this setup is actually utilizing another thread from the pool that is just notifying my first thread to wake up...
Is there a better way to have a suspend and non-suspend set of functions utilize the same code?

You have encountered the infamous "colored function" problem. The two worlds are indeed separate and, while you can add a superficial layer that unifies them, you can't get it at zero performance cost. This is so fundamental that, even assuming that your suspend block never actually suspends, and the wrapping layer leverages that assumption and doesn't even use runBlocking on it, you will still pay the price of "being ready to suspend". The price isn't huge, though: it means creating a small object per each suspend fun call that holds the data that would normally reside on the thread's native call stack. In your case only the outer block is suspendable, so that's just one such object.
runBlocking runs the coroutine on the thread where you called it and it will finish synchronously on the same thread unless it suspends itself. Therefore your case where you'd have some synchronous code in a suspend block wouldn't suffer an additional performance hit from thread coordination.
If the coroutine does suspend itself, then there will have to be some external worker thread which will react to the event that allows the coroutine to resume, and there will have to be some coordination between that thread and your original runBlocking thread. This is a fundamental mechanism that's there with or without coroutines.

Your approach is correct, runBlocking was specifically designed to serve as a connection between blocking and non-blocking operations. From the documentation:
Runs new coroutine and blocks current thread interruptibly until its
completion. This function should not be used from coroutine. It is
designed to bridge regular blocking code to libraries that are written
in suspending style, to be used in main functions and in tests.
https://kotlin.github.io/kotlinx.coroutines/kotlinx-coroutines-core/kotlinx.coroutines/run-blocking.html
Also further read:
https://github.com/Kotlin/kotlinx.coroutines/blob/master/docs/basics.md#bridging-blocking-and-non-blocking-worlds
And some interesting videos by Roman Elizarov:
https://youtu.be/_hfBv0a09Jc
https://youtu.be/a3agLJQ6vt8

Related

What is the difference between stackless and stackfull coroutines? [duplicate]

Background:
I'm asking this because I currently have an application with many (hundreds to thousands) of threads. Most of those threads are idle a great portion of the time, waiting on work items to be placed in a queue. When a work item comes available, it is then processed by calling some arbitrarily-complex existing code. On some operating system configurations, the application bumps up against kernel parameters governing the maximum number of user processes, so I'd like to experiment with means to reduce the number of worker threads.
My proposed solution:
It seems like a coroutine-based approach, where I replace each worker thread with a coroutine, would help to accomplish this. I can then have a work queue backed by a pool of actual (kernel) worker threads. When an item is placed in a particular coroutine's queue for processing, an entry would be placed into the thread pool's queue. It would then resume the corresponding coroutine, process its queued data, and then suspend it again, freeing up the worker thread to do other work.
Implementation details:
In thinking about how I would do this, I'm having trouble understanding the functional differences between stackless and stackful coroutines. I have some experience using stackful coroutines using the Boost.Coroutine library. I find it's relatively easy to comprehend from a conceptual level: for each coroutine, it maintains a copy of the CPU context and stack, and when you switch to a coroutine, it switches to that saved context (just like a kernel-mode scheduler would).
What is less clear to me is how a stackless coroutine differs from this. In my application, the amount of overhead associated with the above-described queuing of work items is very important. Most implementations that I've seen, like the new CO2 library suggest that stackless coroutines provide much lower-overhead context switches.
Therefore, I'd like to understand the functional differences between stackless and stackful coroutines more clearly. Specifically, I think of these questions:
References like this one suggest that the distinction lies in where you can yield/resume in a stackful vs. stackless coroutine. Is this the case? Is there a simple example of something that I can do in a stackful coroutine but not in a stackless one?
Are there any limitations on the use of automatic storage variables (i.e. variables "on the stack")?
Are there any limitations on what functions I can call from a stackless coroutine?
If there is no saving of stack context for a stackless coroutine, where do automatic storage variables go when the coroutine is running?

First, thank you for taking a look at CO2 :)
The Boost.Coroutine doc describes the advantage of stackful coroutine well:
stackfulness
In contrast to a stackless coroutine a stackful coroutine
can be suspended from within a nested stackframe. Execution resumes at
exactly the same point in the code where it was suspended before. With
a stackless coroutine, only the top-level routine may be suspended.
Any routine called by that top-level routine may not itself suspend.
This prohibits providing suspend/resume operations in routines within
a general-purpose library.
first-class continuation
A first-class continuation can be passed as
an argument, returned by a function and stored in a data structure to
be used later. In some implementations (for instance C# yield) the
continuation can not be directly accessed or directly manipulated.
Without stackfulness and first-class semantics, some useful execution
control flows cannot be supported (for instance cooperative
multitasking or checkpointing).
What does that mean to you? for example, imagine you have a function that takes a visitor:
template<class Visitor>
void f(Visitor& v);
You want to transform it to iterator, with stackful coroutine, you can:
asymmetric_coroutine<T>::pull_type pull_from([](asymmetric_coroutine<T>::push_type& yield)
{
f(yield);
});
But with stackless coroutine, there's no way to do so:
generator<T> pull_from()
{
// yield can only be used here, cannot pass to f
f(???);
}
In general, stackful coroutine is more powerful than stackless coroutine.
So why do we want stackless coroutine? short answer: efficiency.
Stackful coroutine typically needs to allocate a certain amount of memory to accomodate its runtime-stack (must be large enough), and the context-switch is more expensive compared to the stackless one, e.g. Boost.Coroutine takes 40 cycles while CO2 takes just 7 cycles in average on my machine, because the only thing that a stackless coroutine needs to restore is the program counter.
That said, with language support, probably stackful coroutine can also take the advantage of the compiler-computed max-size for the stack as long as there's no recursion in the coroutine, so the memory usage can also be improved.
Speaking of stackless coroutine, bear in mind that it doesn't mean that there's no runtime-stack at all, it only means that it uses the same runtime-stack as the host side, so you can call recursive functions as well, just that all the recursions will happen on the host's runtime-stack. In contrast, with stackful coroutine, when you call recursive functions, the recursions will happen on the coroutine's own stack.
To answer the questions:
Are there any limitations on the use of automatic storage variables
(i.e. variables "on the stack")?
No. It's the emulation limitation of CO2. With language support, the automatic storage variables visible to the coroutine will be placed on the coroutine's internal storage. Note my emphasis on "visible to the coroutine", if the coroutine calls a function that uses automatic storage variables internally, then those variables will be placed on the runtime-stack. More specifically, stackless coroutine only has to preserve the variables/temporaries that can be used after resumed.
To be clear, you can use automatic storage variables in CO2's coroutine body as well:
auto f() CO2_RET(co2::task<>, ())
{
int a = 1; // not ok
CO2_AWAIT(co2::suspend_always{});
{
int b = 2; // ok
doSomething(b);
}
CO2_AWAIT(co2::suspend_always{});
int c = 3; // ok
doSomething(c);
} CO2_END
As long as the definition does not precede any await.
Are there any limitations on what functions I can call from a
stackless coroutine?
No.
If there is no saving of stack context for a stackless coroutine,
where do automatic storage variables go when the coroutine is
running?
Answered above, a stackless coroutine doesn't care about the automatic storage variables used in the called functions, they'll just be placed on the normal runtime-stack.
If you have any doubt, just check CO2's source code, it may help you understand the mechanics under the hood ;)

What you want are user-land threads/fibers - usually you want to suspend the your code (running in fiber) in a deep nested call stack (for instance parsing messages from TCP-connection). In this case you can not use stackless context switching (application stack is shared between stackless coroutines -> stack frames of called subroutines would be overwritten).
You can use something like boost.fiber which implements user-land threads/fibers based on boost.context.

What is the difference between Asynchronous calls and Callbacks

I'm bit confused to understand the difference between Asynchronous calls and Callbacks.
I read this posts which teach about CallBacks but none of the answers addresses how it differs from Asynchronous calls.
Is this Callbacks = Lambda Expressions?
Callbacks are running in a different thread?
Can anyone explains this with plain simple English?

Very simply, a callback needn't be asynchronous.
http://docs.apigee.com/api-baas/asynchronous-vs-synchronous-calls
Synchronous:
If an API call is synchronous, it means that code execution will
block (or wait) for the API call to return before continuing. This
means that until a response is returned by the API, your application
will not execute any further, which could be perceived by the user as
latency or performance lag in your app. Making an API call
synchronously can be beneficial, however, if there if code in your app
that will only execute properly once the API response is received.
Asynchronous:
Asynchronous calls do not block (or wait) for the API call to return
from the server. Execution continues on in your program, and when the
call returns from the server, a "callback" function is executed.
In Java, C and C#, "callbacks" are usually synchronous (with respect to a "main event loop").
In Javascript, on the other hand, callbacks are usually asynchronous - you pass a function that will be invoked ... but other events will continue to be processed until the callback is invoked.
If you don't care what Javascript events occur in which order - great. Otherwise, one very powerful mechanism for managing asynchronous behavior in Javascript is to use "promises":
http://www.html5rocks.com/en/tutorials/es6/promises/
PS:
To answer your additional questions:
Yes, a callback may be a lambda - but it's not a requirement.
In Javascript, just about every callback will be an "anonymous function" (basically a "lambda expression").
Yes, callbacks may be invoked from a different thread - but it's certainly not a requirement.
Callbacks may also (and often do) spawn a thread (thus making themselves "asynchronous").
'Hope that helps
====================================================================
Hi, Again:
Q: #paulsm4 can you please elaborate with an example how the callback
and asynchronous call works in the execution flow? That will be
greatly helpful
First we need to agree on a definition for "callback". Here's a good one:
https://en.wikipedia.org/wiki/Callback_%28computer_programming%29
In computer programming, a callback is a piece of executable code that
is passed as an argument to other code, which is expected to call back
(execute) the argument at some convenient time. The invocation may be
immediate as in a synchronous callback, or it might happen at a later
time as in an asynchronous callback.
We must also define "synchronous" and "asynchronous". Basically - if a callback does all it's work before returning to the caller, it's "synchronous". If it can return to the caller immediately after it's invoked - and the caller and the callback can work in parallel - then it's "asynchronous".
The problem with synchronous callbacks is they can appear to "hang". The problem with asynchronous callbacks is you can lose control of "ordering" - you can't necessarily guarantee that "A" will occur before "B".
Common examples of callbacks include:
a) a button press handler (each different "button" will have a different "response"). These are usually invoked "asynchronousy" (by the GUI's main event loop).
b) a sort "compare" function (so a common "sort()" function can handle different data types). These are usually invoked "synchronously" (called directly by your program).
A CONCRETE EXAMPLE:
a) I have a "C" language program with a "print()" function.
b) "print()" is designed to use one of three callbacks: "PrintHP()", "PrintCanon()" and "PrintPDF()".
c) "PrintPDF()" calls a library to render my data in PDF. It's synchronous - the program doesn't return back from "print()" until the .pdf rendering is complete. It usually goes pretty quickly, so there's no problem.
d) I've coded "PrintHP()" and "PrintCanon()" to spawn threads to do the I/O to the physical printer. "Print()" exits as soon as the thread is created; the actual "printing" goes on in parallel with program execution. These two callbacks are "asynchronous".
Q: Make sense? Does that help?

They are quite similar but this is just mho.
When you use callbacks you specify which method should you should be called back on and you rely on the methods you call to call you back. You could specify your call back to end up anywhere and you are not guaranteed to be called back.
In Asynchronous programming, the call stack should unwind to the starting position, just as in normal synchronous programming.
Caveat: I am specifically thinking of the C# await functionality as there are other async techniques.

I want to comment paulsm4 above, but I have no enough reputation, so I have to give another new answer.
according to wikipedia, "a callback is a piece of executable code that is passed as an argument to other code, which is expected to call back (execute) the argument at some convenient time. The invocation may be immediate as in a synchronous callback, or it might happen at a later time as in an asynchronous callback.", so the decorative word "synchronous" and "asynchronous" are on "callback", which is the key-point. we often confuse them with an "asynchronous function", which is the caller function indeed. For example,
const xhr = new XMLHttpRequest();
xhr.addEventListener('loadend', () => {
log.textContent = `${log.textContent}Finished with status: ${xhr.status}`;
});
xhr.open('GET', 'https://raw.githubusercontent.com/mdn/content/main/files/en-us/_wikihistory.json');
xhr.send();
here, xhr.send() is an asynchronous caller function, while the anonymous function defined in xhr.addEventListener is an asynchronous callback function.
for clarification, the following is synchronous callback example:
function doOperation(callback) {
const name = "world";
callback(name);
}
function doStep(name) {
log.console(`hello, ${name}`);
}
doOperation(doStep)
then, let's answer the specified question:
Is this Callbacks = Lambda Expressions?
A: nop, callback is just a normal function, it can be named or anonymous(lambda expressions).
Callbacks are running in a different thread?
A: if callbacks are synchronous, they are running within the same thread of the caller function. if callbacks are asynchronous, they are running in another thread to the caller function to avoid blocking the execution of the caller.

A call is Synchronous: It returns control to the caller when it's done.
A call is Async.: Otherwise.

Main loop in event-driven programming and alternatives

To the best of my knowledge, event-driven programs require a main loop such as
while (1) {
}
I am just curious if this while loop can cost a high CPU usage? Is there any other way to implement event-driven programs without using the main loop?

Your example is misleading. Usually, an event loop looks something like this:
Event e;
while ((e = get_next_event()) != E_QUIT)
{
handle(e);
}
The crucial point is that the function call to our fictitious get_next_event() pumping function will be generous and encourage a context switch or whatever scheduling semantics apply to your platform, and if there are no events, the function would probably allow the entire process to sleep until an event arrives.
So in practice there's nothing to worry about, and no, there's not really any alternative to an unbounded loop if you want to process an unbounded amount of information during your program's runtime.

Usually, the problem with a loop like this is that while it's doing one piece of work, it can't be doing anything else (e.g. Windows SDK's old 'cooperative' multitasking). The next naive jump up from this is generally to spawn a thread for each piece of work, but that's incredibly dangerous. Most people would end up with an executor that generally has a thread pool inside. Then, the handle call is actually just enqueueing the work and the next available thread dequeues it and executes it. The number of concurrent threads remains fixed as the total number of worker threads in the pool and when threads don't have anything to do, they are not eating CPU.

asynchronous and non-blocking calls? also between blocking and synchronous

What is the difference between asynchronous and non-blocking calls? Also between blocking and synchronous calls (with examples please)?

In many circumstances they are different names for the same thing, but in some contexts they are quite different. So it depends. Terminology is not applied in a totally consistent way across the whole software industry.
For example in the classic sockets API, a non-blocking socket is one that simply returns immediately with a special "would block" error message, whereas a blocking socket would have blocked. You have to use a separate function such as select or poll to find out when is a good time to retry.
But asynchronous sockets (as supported by Windows sockets), or the asynchronous IO pattern used in .NET, are more convenient. You call a method to start an operation, and the framework calls you back when it's done. Even here, there are basic differences. Asynchronous Win32 sockets "marshal" their results onto a specific GUI thread by passing Window messages, whereas .NET asynchronous IO is free-threaded (you don't know what thread your callback will be called on).
So they don't always mean the same thing. To distil the socket example, we could say:
Blocking and synchronous mean the same thing: you call the API, it hangs up the thread until it has some kind of answer and returns it to you.
Non-blocking means that if an answer can't be returned rapidly, the API returns immediately with an error and does nothing else. So there must be some related way to query whether the API is ready to be called (that is, to simulate a wait in an efficient way, to avoid manual polling in a tight loop).
Asynchronous means that the API always returns immediately, having started a "background" effort to fulfil your request, so there must be some related way to obtain the result.

synchronous / asynchronous is to describe the relation between two modules.
blocking / non-blocking is to describe the situation of one module.
An example:
Module X: "I".
Module Y: "bookstore".
X asks Y: do you have a book named "c++ primer"?
blocking: before Y answers X, X keeps waiting there for the answer. Now X (one module) is blocking. X and Y are two threads or two processes or one thread or one process? we DON'T know.
non-blocking: before Y answers X, X just leaves there and do other things. X may come back every two minutes to check if Y has finished its job? Or X won't come back until Y calls him? We don't know. We only know that X can do other things before Y finishes its job. Here X (one module) is non-blocking. X and Y are two threads or two processes or one process? we DON'T know. BUT we are sure that X and Y couldn't be one thread.
synchronous: before Y answers X, X keeps waiting there for the answer. It means that X can't continue until Y finishes its job. Now we say: X and Y (two modules) are synchronous. X and Y are two threads or two processes or one thread or one process? we DON'T know.
asynchronous: before Y answers X, X leaves there and X can do other jobs. X won't come back until Y calls him. Now we say: X and Y (two modules) are asynchronous. X and Y are two threads or two processes or one process? we DON'T know. BUT we are sure that X and Y couldn't be one thread.
Please pay attention on the two bold-sentences above. Why does the bold-sentence in the 2) contain two cases whereas the bold-sentence in the 4) contains only one case? This is a key of the difference between non-blocking and asynchronous.
Let me try to explain the four words with another way:
blocking: OMG, I'm frozen! I can't move! I have to wait for that specific event to happen. If that happens, I would be saved!
non-blocking: I was told that I had to wait for that specific event to happen. OK, I understand and I promise that I would wait for that. But while waiting, I can still do some other things, I'm not frozen, I'm still alive, I can jump, I can walk, I can sing a song etc.
synchronous: My mom is gonna cook, she sends me to buy some meat. I just said to my mom: We are synchronous! I'm so sorry but you have to wait even if I might need 100 years to get some meat back...
asynchronous: We will make a pizza, we need tomato and cheeze. Now I say: Let's go shopping. I'll buy some tomatoes and you will buy some cheeze. We needn't wait for each other because we are asynchronous.
Here is a typical example about non-blocking & synchronous:
// thread X
while (true)
{
msg = recv(Y, NON_BLOCKING_FLAG);
if (msg is not empty)
{
break;
}
else
{
sleep(2000); // 2 sec
}
}
// thread Y
// prepare the book for X
send(X, book);
You can see that this design is non-blocking (you can say that most of time this loop does something nonsense but in CPU's eyes, X is running, which means that X is non-blocking. If you want you can replace sleep(2000) with any other code) whereas X and Y (two modules) are synchronous because X can't continue to do any other things (X can't jump out of the loop) until it gets the book from Y.
Normally in this case, making X blocking is much better because non-blocking spends much resource for a stupid loop. But this example is good to help you understand the fact: non-blocking doesn't mean asynchronous.
The four words do make us confused easily, what we should remember is that the four words serve for the design of architecture. Learning about how to design a good architecture is the only way to distinguish them.
For example, we may design such a kind of architecture:
// Module X = Module X1 + Module X2
// Module X1
while (true)
{
msg = recv(many_other_modules, NON_BLOCKING_FLAG);
if (msg is not null)
{
if (msg == "done")
{
break;
}
// create a thread to process msg
}
else
{
sleep(2000); // 2 sec
}
}
// Module X2
broadcast("I got the book from Y");
// Module Y
// prepare the book for X
send(X, book);
In the example here, we can say that
X1 is non-blocking
X1 and X2 are synchronous
X and Y are asynchronous
If you need, you can also describe those threads created in X1 with the four words.
One more time: the four words serve for the design of architecture. So what we need is to make a proper architecture, instead of distinguishing the four words like a language lawyer. If you get some cases, where you can't distinguish the four words very clearly, you should forget about the four words, use your own words to describe your architecture.
So the more important things are: when do we use synchronous instead of asynchronous? when do we use blocking instead of non-blocking? Is making X1 blocking better than non-blocking? Is making X and Y synchronous better than asynchronous? Why is Nginx non-blocking? Why is Apache blocking? These questions are what you must figure out.
To make a good choice, you must analyze your need and test the performance of different architectures. There is no such an architecture that is suitable for various of needs.

Asynchronous refers to something done in parallel, say is another thread.
Non-blocking often refers to polling, i.e. checking whether given condition holds (socket is readable, device has more data, etc.)

Synchronous is defined as happening at the same time (in predictable timing, or in predictable ordering).
Asynchronous is defined as not happening at the same time. (with unpredictable timing or with unpredictable ordering).
This is what causes the first confusion, which is that asynchronous is some sort of synchronization scheme, and yes it is used to mean that, but in actuality it describes processes that are happening unpredictably with regards to when or in what order they run. And such events often need to be synchronized in order to make them behave correctly, where multiple synchronization schemes exists to do so, one of those called blocking, another called non-blocking, and yet another one confusingly called asynchronous.
So you see, the whole problem is about finding a way to synchronize an asynchronous behavior, because you've got some operation that needs the response of another before it can begin. Thus it's a coordination problem, how will you know that you can now start that operation?
The simplest solution is known as blocking.
Blocking is when you simply choose to wait for the other thing to be done and return you a response before moving on to the operation that needed it.
So if you need to put butter on toast, and thus you first need to toast the bred. The way you'd coordinate them is that you'd first toast the bred, then stare endlessly at the toaster until it pops the toast, and then you'd proceed to put butter on them.
It's the simplest solution, and works very well. There's no real reason not to use it, unless you happen to also have other things you need to be doing which don't require coordination with the operations. For example, doing some dishes. Why wait idle staring at the toaster constantly for the toast to pop, when you know it'll take a bit of time, and you could wash a whole dish while it finishes?
That's where two other solutions known respectively as non-blocking and asynchronous come into play.
Non-blocking is when you choose to do other unrelated things while you wait for the operation to be done. Checking back on the availability of the response as you see fit.
So instead of looking at the toaster for it to pop. You go and wash a whole dish. And then you peek at the toaster to see if the toasts have popped. If they haven't, you go wash another dish, checking back at the toaster between each dish. When you see the toasts have popped, you stop washing the dishes, and instead you take the toast and move on to putting butter on them.
Having to constantly check on the toasts can be annoying though, imagine the toaster is in another room. In between dishes you waste your time going to that other room to check on the toast.
Here comes asynchronous.
Asynchronous is when you choose to do other unrelated things while you wait for the operation to be done. Instead of checking on it though, you delegate the work of checking to something else, could be the operation itself or a watcher, and you have that thing notify and possibly interupt you when the response is availaible so you can proceed to the other operation that needed it.
Its a weird terminology. Doesn't make a whole lot of sense, since all these solutions are ways to create synchronous coordination of dependent tasks. That's why I prefer to call it evented.
So for this one, you decide to upgrade your toaster so it beeps when the toasts are done. You happen to be constantly listening, even while you are doing dishes. On hearing the beep, you queue up in your memory that as soon as you are done washing your current dish, you'll stop and go put the butter on the toast. Or you could choose to interrupt the washing of the current dish, and deal with the toast right away.
If you have trouble hearing the beep, you can have your partner watch the toaster for you, and come tell you when the toast is ready. Your partner can itself choose any of the above three strategies to coordinate its task of watching the toaster and telling you when they are ready.
On a final note, it's good to understand that while non-blocking and async (or what I prefer to call evented) do allow you to do other things while you wait, you don't have too. You can choose to constantly loop on checking the status of a non-blocking call, doing nothing else. That's often worse than blocking though (like looking at the toaster, then away, then back at it until it's done), so a lot of non-blocking APIs allow you to transition into a blocking mode from it. For evented, you can just wait idle until you are notified. The downside in that case is that adding the notification was complex and potentially costly to begin with. You had to buy a new toaster with beep functionality, or convince your partner to watch it for you.
And one more thing, you need to realize the trade offs all three provide. One is not obviously better than the others. Think of my example. If your toaster is so fast, you won't have time to wash a dish, not even begin washing it, that's how fast your toaster is. Getting started on something else in that case is just a waste of time and effort. Blocking will do. Similarly, if washing a dish will take 10 times longer then the toasting. You have to ask yourself what's more important to get done? The toast might get cold and hard by that time, not worth it, blocking will also do. Or you should pick faster things to do while you wait. There's more obviously, but my answer is already pretty long, my point is you need to think about all that, and the complexities of implementing each to decide if its worth it, and if it'll actually improve your throughput or performance.
Edit:
Even though this is already long, I also want it to be complete, so I'll add two more points.
There also commonly exists a fourth model known as multiplexed. This is when while you wait for one task, you start another, and while you wait for both, you start one more, and so on, until you've got many tasks all started and then, you wait idle, but on all of them. So as soon as any is done, you can proceed with handling its response, and then go back to waiting for the others. It's known as multiplexed, because while you wait, you need to check each task one after the other to see if they are done, ad vitam, until one is. It's a bit of an extension on top of normal non-blocking.
In our example it would be like starting the toaster, then the dishwasher, then the microwave, etc. And then waiting on any of them. Where you'd check the toaster to see if it's done, if not, you'd check the dishwasher, if not, the microwave, and around again.
Even though I believe it to be a big mistake, synchronous is often used to mean one thing at a time. And asynchronous many things at a time. Thus you'll see synchronous blocking and non-blocking used to refer to blocking and non-blocking. And asynchronous blocking and non-blocking used to refer to multiplexed and evented.
I don't really understand how we got there. But when it comes to IO and Computation, synchronous and asynchronous often refer to what is better known as non-overlapped and overlapped. That is, asynchronous means that IO and Computation are overlapped, aka, happening concurrently. While synchronous means they are not, thus happening sequentially. For synchronous non-blocking, that would mean you don't start other IO or Computation, you just busy wait and simulate a blocking call. I wish people stopped misusing synchronous and asynchronous like that. So I'm not encouraging it.
Edit2:
I think a lot of people got a bit confused by my definition of synchronous and asynchronous. Let me try and be a bit more clear.
Synchronous is defined as happening with predictable timing and/or ordering. That means you know when something will start and end.
Asynchronous is defined as not happening with predictable timing and/or ordering. That means you don't know when something will start and end.
Both of those can be happening in parallel or concurrently, or they can be happening sequentially. But in the synchronous case, you know exactly when things will happen, while in the asynchronous case you're not sure exactly when things will happen, but you can still put some coordination in place that at least guarantees some things will happen only after others have happened (by synchronizing some parts of it).
Thus when you have asynchronous processes, asynchronous programming lets you place some order guarantees so that some things happen in the right sequence, even though you don't know when things will start and end.
Here's an example, if we need to do A then B and C can happen at any time. In a sequential but asynchronous model you can have:
A -> B -> C
or
A -> C -> B
or
C -> A -> B
Every time you run the program, you could get a different one of those, seemingly at random. Now this is still sequential, nothing is parallel or concurrent, but you don't know when things will start and end, except you have made it so B always happens after A.
If you add concurrency only (no parallelism), you can also get things like:
A<start> -> C<start> -> A<end> -> C<end> -> B<start> -> B<end>
or
C<start> -> A<start> -> C<end> -> A<end> -> B<start> -> B<end>
or
A<start> -> A<end> -> B<start> -> C<start> -> B<end> -> C<end>
etc...
Once again, you don't really know when things will start and end, but you have made it so B is coordinated to always start after A ends, but that's not necessarily immediately after A ends, it's at some unknown time after A ends, and B could happen in-between fully or partially.
And if you add parallelism, now you have things like:
A<start> -> A<end> -> B<start> -> B<end> ->
C<start> -> C<keeps going> -> C<keeps going> -> C<end>
or
A<start> -> A<end> -> B<start> -> B<end>
C<start> -> C<keeps going> -> C<end>
etc...
Now if we look at the synchronous case, in a sequential setting you would have:
A -> B -> C
And this is the order always, each time you run the program, you get A then B and then C, even though C conceptually from the requirements can happen at any time, in a synchronous model you still define exactly when it will start and end. Off course, you could specify it like:
C -> A -> B
instead, but since it is synchronous, then this order will be the ordering every time the program is ran, unless you changed the code again to change the order explicitly.
Now if you add concurrency to a synchronous model you can get:
C<start> -> A<start> -> C<end> -> A<end> -> B<start> -> B<end>
And once again, this would be the order no matter how many time you ran the program. And similarly, you could explicitly change it in your code, but it would be consistent across program execution.
Finally, if you add parallelism as well to a synchronous model you get:
A<start> -> A<end> -> B<start> -> B<end>
C<start> -> C<end>
Once again, this would be the case on every program run. An important aspect here is that to make it fully synchronous this way, it means B must start after both A and C ends. If C is an operation that can complete faster or slower say depending on the CPU power of the machine, or other performance consideration, to make it synchronous you still need to make it so B waits for it to end, otherwise you get an asynchronous behavior again, where not all timings are deterministic.
You'll get this kind of synchronous thing a lot in coordinating CPU operations with the CPU clock, and you have to make sure that you can complete each operation in time for the next clock cycle, otherwise you need to delay everything by one more clock to give room for this one to finish, if you don't, you mess up your synchronous behavior, and if things depended on that order they'd break.
Finally, lots of systems have synchronous and asynchronous behavior mixed in, so if you have any kind of inherently unpredictable events, like when a user will click a button, or when a remote API will return a response, but you need things to have guaranteed ordering, you will basically need a way to synchronize the asynchronous behavior so it guarantees order and timing as needed. Some strategies to synchronize those are what I talk about previously, you have blocking, non-blocking, async, multiplexed, etc. See the emphasis on "async", this is what I mean by the word being confusing. Somebody decided to call a strategy to synchronize asynchronous processes "async". This then wrongly made people think that asynchronous meant concurrent and synchronous meant sequential, or that somehow blocking was the opposite of asynchronous, where as I just explained, synchronous and asynchronous in reality is a different concept that relates to the timing of things as being in sync (in time with each other, either on some shared clock or in a predictable order) or out of sync (not on some shared clock or in an unpredictable order). Where as asynchronous programming is a strategy to synchronize two events that are themselves asynchronous (happening at an unpredictable time and/or order), and for which we need to add some guarantees of when they might happen or at least in what order.
So we're left with two things using the word "asynchronous" in them:
Asynchronous processes: processes that we don't know at what time they will start and end, and thus in what order they would end up running.
Asynchronous programming: a style of programming that lets you synchronize two asynchronous processes using callbacks or watchers that interrupt the executor in order to let them know something is done, so that you can add predictable ordering between the processes.

A nonblocking call returns immediately with whatever data are available: the full number of bytes requested, fewer, or none at all.
An asynchronous call requests a transfer that will be performed in its whole(entirety) but will complete at some future time.

Putting this question in the context of NIO and NIO.2 in java 7, async IO is one step more advanced than non-blocking.
With java NIO non-blocking calls, one would set all channels (SocketChannel, ServerSocketChannel, FileChannel, etc) as such by calling AbstractSelectableChannel.configureBlocking(false).
After those IO calls return, however, you will likely still need to control the checks such as if and when to read/write again, etc.
For instance,
while (!isDataEnough()) {
socketchannel.read(inputBuffer);
// do something else and then read again
}
With the asynchronous api in java 7, these controls can be made in more versatile ways.
One of the 2 ways is to use CompletionHandler. Notice that both read calls are non-blocking.
asyncsocket.read(inputBuffer, 60, TimeUnit.SECONDS /* 60 secs for timeout */,
new CompletionHandler<Integer, Object>() {
public void completed(Integer result, Object attachment) {...}
public void failed(Throwable e, Object attachment) {...}
}
}

As you can probably see from the multitude of different (and often mutually exclusive) answers, it depends on who you ask. In some arenas, the terms are synonymous. Or they might each refer to two similar concepts:
One interpretation is that the call will do something in the background essentially unsupervised in order to allow the program to not be held up by a lengthy process that it does not need to control. Playing audio might be an example - a program could call a function to play (say) an mp3, and from that point on could continue on to other things while leaving it to the OS to manage the process of rendering the audio on the sound hardware.
The alternative interpretation is that the call will do something that the program will need to monitor, but will allow most of the process to occur in the background only notifying the program at critical points in the process. For example, asynchronous file IO might be an example - the program supplies a buffer to the operating system to write to file, and the OS only notifies the program when the operation is complete or an error occurs.
In either case, the intention is to allow the program to not be blocked waiting for a slow process to complete - how the program is expected to respond is the only real difference. Which term refers to which also changes from programmer to programmer, language to language, or platform to platform. Or the terms may refer to completely different concepts (such as the use of synchronous/asynchronous in relation to thread programming).
Sorry, but I don't believe there is a single right answer that is globally true.

Blocking call: Control returns only when the call completes.
Non blocking call: Control returns immediately. Later OS somehow notifies the process that the call is complete.
Synchronous program: A program which uses Blocking calls. In order not to freeze during the call it must have 2 or more threads (that's why it's called Synchronous - threads are running synchronously).
Asynchronous program: A program which uses Non blocking calls. It can have only 1 thread and still remain interactive.

Non-blocking: This function won't wait while on the stack.
Asynchronous: Work may continue on behalf of the function call after that call has left the stack

Synchronous means to start one after the other's result, in a sequence.
Asynchronous means start together, no sequence is guaranteed on the result
Blocking means something that causes an obstruction to perform the next step.
Non-blocking means something that keeps running without waiting for anything, overcoming the obstruction.
Blocking eg: I knock on the door and wait till they open it. ( I am idle here )
Non-Blocking eg: I knock on the door, if they open it instantly, I greet them, go inside, etc. If they do not open instantly, I go to the next house and knock on it. ( I am doing something or the other, not idle )
Synchrounous eg: I will go out only if it rains. ( dependency exists )
Asynchronous eg: I will go out. It can rain. ( independent events, does't matter when they occur )
Synchronous or Asynchronous, both can be blocking or non-blocking and vice versa

The blocking models require the initiating application to block when the I/O has started. This means that it isn't possible to overlap processing and I/O at the same time. The synchronous non-blocking model allows overlap of processing and I/O, but it requires that the application check the status of the I/O on a recurring basis. This leaves asynchronous non-blocking I/O, which permits overlap of processing and I/O, including notification of I/O completion.

To Simply Put,
function sum(a,b){
return a+b;
}
is a Non Blocking. while Asynchronous is used to execute Blocking task and then return its response

synchronous
asynchonous
block
Block I/O must be a synchronus I/O, becuase it has to be executed in order. Synchronous I/O might not be block I/O
Not exist
non-block
Non-block and Synchronous I/O at the same time is polling/multi-plexing..
Non-block and Asynchronous I/O at the same time is parallel execution, such as signal trigger…
block/non-block describe behavior of the initializing entity itself, it means what the entity does during wating for I/O completion
synchronous/asynchronous describe behavior between I/O initilaizing entity and I/O executor(the operating system, for example), it means whether these two entity can be executed parallelly

They differ in spelling only. There is no difference in what they refer to. To be technical you could say they differ in emphasis. Non blocking refers to control flow(it doesn't block.) Asynchronous refers to when the event\data is handled(not synchronously.)

Blocking: control returns to invoking precess after processing of primitive(sync or async) completes
Non blocking: control returns to process immediately after invocation

Is Erlang's recursive functions not just a goto?

Just to get it straight in my head. Consider this example bit of Erlang code:
test() ->
receive
{From, whatever} ->
%% do something
test();
{From, somethingelse} ->
%% do something else
test();
end.
Isn't the test() call, just a goto?
I ask this because in C we learned, if you do a function call, the return location is always put on the stack. I can't imagine this must be the case in Erlang here since this would result in a stackoverflow.
We had 2 different ways of calling functions:
goto and gosub.
goto just steered the program flow somewhere else, and gosub remembered where you came from so you could return.
Given this way of thinking, I can look at Erlang's recursion easier, since if I just read: test() as a goto, then there is no problem at all.
hence my question: isn't :Erlang just using a goto instead of remembering the return address on a stack?
EDIT:
Just to clarify my point:
I know goto's can be used in some languages to jump all over the place. But just supose instead of doing someFunction() you can also do: goto someFunction()
in the first example the flow returns, in the second example the flow just continues in someFunction and never returns.
So we limit the normal GOTO behaviour by just being able to jump to method starting points.
If you see it like this, than the Erlang recursive function call looks like a goto.
(a goto in my opinion is a function call without the ability to return where you came from). Which is exactly what is happening in the Erlang example.

A tail recursive call is more of a "return and immediately call this other function" than a goto because of the housekeeping that's performed.
Addressing your newest points: recording the return point is just one bit of housekeeping that's performed when a function is called. The return point is stored in the stack frame, the rest of which must be allocated and initialized (in a normal call), including the local variables and parameters. With tail recursion, a new frame doesn't need to be allocated and the return point doesn't need to be stored (the previous value works fine), but the rest of the housekeeping needs to be performed.
There's also housekeeping that needs to be performed when a function returns, which includes discarding locals and parameters (as part of the stack frame) and returning to the call point. During tail recursive call, the locals for the current function are discarded before invoking the new function, but no return happens.
Rather like how threads allow for lighter-weight context switching than processes, tail calls allow for lighter-weight function invocation, since some of the housekeeping can be skipped.
The "goto &NAME" statement in Perl is closer to what you're thinking of, but not quite, as it discards locals. Parameters are kept around for the newly invoked function.
One more, simple difference: a tail call can only jump to a function entry point, while a goto can jump most anywhere (some languages restrict the target of a goto, such as C, where goto can't jump outside a function).

You are correct, the Erlang compiler will detect that it is a tail recursive call, and instead of moving on on the stack, it reuses the current function's stack space.
Furthermore it is also able to detect circular tail-recursion, e.g.
test() -> ..., test2().
test2() -> ..., test3().
test3() -> ..., test().
will also be optimized.
The "unfortunate" side-effect of this is that when you are tracing function calls, you will not be able to see each invocation of a tail recursive function, but the entry and exit point.

You've got two questions here.
First, no, you're not in any danger of overrunning the stack in this case because these calls to test() are both tail-recursive.
Second, no, function calls are not goto, they're function calls. :) The thing that makes goto problematic is that it bypasses any structure in your code. You can jump out of statements, jump into statements, bypass assignments...all kinds of screwiness. Function calls don't have this problem because they have an obvious flow of control.

I think the difference here is between a "real" goto and what can in some cases seem like a goto. In some special cases the compiler can detect that it is free to cleanup the stack of the current function before calling another function. This is when the call is the last call in a function. The difference is, of course, that as in any other call you can pass arguments to the new function.
As others have pointed out this optimisation is not restricted to recursive calls but to all last calls. This is used in the "classic" way of programming FSMs.

It's a goto in the same why that if is goto and while is goto. It is implemented using (the moral equivalent of) goto, but it does not expose the full shoot-self-in-foot potential of goto directly to the programmer.

In fact, these recursive functions are the ultimate GOTO according to Guy Steele.

Here's a more general answer, which supercedes my earlier answer based on call-stacks. Since the earlier answer has been accepted, I won't replace the text.
Prologue
Some architectures don't have things they call "functions" that are "called", but do have something analogous (messaging may call them "methods" or "message handlers"; event based architectures have "event handlers" or simply "handlers"). I'll be using the terms "code block" and "invocation" for the general case, though (strictly speaking) "code block" can include things that aren't quite functions. You can substitute the appropriately inflected form of "call" for "invocation" or "invoke", as I might in a few places. The features of an architecture that describe invocation are sometimes called "styles", as in "continuation passing style" (CPS), though this isn't previously an official term. To keep things from being too abstract, we'll examine call stack, continuation passing, messaging (à la OOP) and event handling invocation styles. I should specify the models I'm using for these styles, but I'm leaving them out in the interest of space.
Invocation Features
or, C Is For Continuation, Coordination and Context, That's Good Enough For Me
Hohpe identifies three nicely alliterative invocation features of the call-stack style: Continuation, Coordination, Context (all capitalized to distinguish them from other uses of the words).
Continuation decides where execution will continue when a code block finishes. The "Continuation" feature is related to "first-class continuations" (often simply called "continuations", including by me), in that continuations make the Continuation feature visible and manipulable at a programmatic level.
Coordination means code doesn't execute until the data it needs is ready. Within a single call stack, you get Coordination for free because the program counter won't return to a function until a called function finishes. Coordination becomes an issue in (e.g.) concurrent and event-driven programming, the former because a data producer may fall behind a data consumer and the latter because when a handler fires an event, the handler continues immediately without waiting for a response.
Context refers to the environment that is used to resolve names in a code block. It includes allocation and initialization of the local variables, parameters and return value(s). Parameter passing is also covered by the calling convention (keeping up the alliteration); for the general case, you could split Context into a feature that covers locals, one that covers parameters and another for return values. For CPS, return values are covered by parameter passing.
The three features aren't necessarily independent; invocation style determines their interrelationships. For instance, Coordination is tied to Continuation under the call-stack style. Continuation and Context are connected in general, since return values are involved in Continuation.
Hohpe's list isn't necessarily exhaustive, but it will suffice to distinguish tail-calls from gotos. Warning: I might go off on tangents, such as exploring invocation space based on Hohpe's features, but I'll try to contain myself.
Invocation Feature Tasks
Each invocation feature involves tasks to be completed when invoking a code block. For Continuation, invoked code blocks are naturally related by a chain of invoking code. When a code block is invoked, the current invocation chain (or "call chain") is extended by placing a reference (an "invocation reference") to the invoking code at the end of the chain (this process is described more concretely below). Taking into account invocation also involves binding names to code blocks and parameters, we see even non-bondage-and-discipline languages can have the same fun.
Tail Calls
or, The Answer
or, The Rest Is Basically Unnecessary
Tail calling is all about optimizing Continuation, and it's a matter of recognizing when the main Continuation task (recording an invocation reference) can be skipped. The other feature tasks stand on their own. A "goto" represents optimizing away tasks for Continuation and Context. That's pretty much why a tail call isn't a simple "goto". What follows will flesh out what tail calls look like in various invocation styles.
Tail Calls In Specific Invocation Styles
Different styles arrange invocation chains in different structures, which I'll call a "tangle", for lack of a better word. Isn't it nice that we've gotten away from spaghetti code?
With a call-stack, there's only one invocation chain in the tangle; extending the chain means pushing the program counter. A tail call means no program counter push.
Under CPS, the tangle consists of the extant continuations, which form a reverse arborescence (a directed tree where every edge points towards a central node), where each path back to the center is a invocation chain (note: if the program entry point is passed a "null" continuation, the tangle can be a whole forest of reverse arborescences). One particular chain is the default, which is where an invocation reference is added during invocation. Tail calls won't add an invocation reference to the default invocation chain. Note that "invocation chain" here is basically synonymous with "continuation", in the sense of "first class continuation".
Under message passing, the invocation chain is a chain of blocked methods, each waiting for a response from the method before it in the chain. A method that invokes another is a "client"; the invoked method is a "supplier" (I'm purposefully not using "service", though "supplier" isn't much better). A messaging tangle is a set of unconnected invocation chains. This tangle structure is rather like having multiple thread or process stacks. When the method merely echos another method's response as its own, the method can have its client wait on its supplier rather than itself. Note that this gives a slightly more general optimization, one that involves optimizing Coordination as well as Continuation. If the final portion of a method doesn't depend on a response (and the response doesn't depend on the data processed in the final portion), the method can continue once it's passed on its client's wait dependency to its supplier. This is analogous to launching a new thread, where the final portion of the method becomes the thread's main function, followed by a call-stack style tail call.
What About Event Handling Style?
With event handling, invocations don't have responses and handlers don't wait, so "invocation chains" (as used above) isn't a useful concept. Instead of a tangle, you have priority queues of events, which are owned by channels, and subscriptions, which are lists of listener-handler pairs. In some event driven architectures, channels are properties of listeners; every listener owns exactly one channel, so channels become synonymous with listeners. Invoking means firing an event on a channel, which invokes all subscribed listener-handlers; parameters are passed as properties of the event. Code that would depend on a response in another style becomes a separate handler under event handling, with an associated event. A tail call would be a handler that fires the event on another channel and does nothing else afterwards. Tail call optimization would involve re-subscribing listeners for the event from the second channel to the first, or possibly having the handler that fired the event on the first channel instead fire on the second channel (an optimization made by the programmer, not the compiler/interpreter). Here's what the former optimization looks like, starting with the un-optimized version.
Listener Alice subscribes to event "inauguration" on BBC News, using handler "party"
Alice fires event "election" on channel BBC News
Bob is listening for "election" on BBC News, so Bob's "openPolls" handler is invoked
Bob subscribes to event "inauguration" on channel CNN.
Bob fires event "voting" on channel CNN
Other events are fired & handled. Eventually, one of them ("win", for example) fires event "inauguration" on CNN.
Bob's barred handler fires "inauguration" on BBC News
Alice's inauguration handler is invoked.
And the optimized version:
Listener Alice subscribes to event "inauguration" on BBC News
Alice fires event "election" on channel BBC News
Bob is listening for "election" on BBC News, so Bob's "openPolls" handler is invoked
Bob subscribes anyone listening for "inauguration" on BBC News to the inauguration event on CNN*.
Bob fires event "voting" on channel CNN
Other events are fired & handled. Eventually, one of them fires event "inauguration" on CNN.
Alice's inauguration handler is invoked for the inauguration event on CNN.
Note tail calls are trickier (untenable?) under event handling because they have to take into account subscriptions. If Alice were later to unsubscribe from "inauguration" on BBC News, the subscription to inauguration on CNN would also need to be canceled. Additionally, the system must ensure it doesn't inappropriately invoke a handler multiple times for a listener. In the above optimized example, what if there's another handler for "inauguration" on CNN that fires "inauguration" on BBC News? Alice's "party" event will be fired twice, which may get her in trouble at work. One solution is to have *Bob unsubscribe all listeners from "inauguration" on BBC News in step 4, but then you introduce another bug wherein Alice will miss inauguration events that don't come via CNN. Maybe she wants to celebrate both the U.S. and British inaugurations. These problems arise because there are distinctions I'm not making in the model, possibly based on types of subscriptions. For instance, maybe there's a special kind of one-shot subscription (like System-V signal handlers) or some handlers unsubscribe themselves, and tail call optimization is only applied in these cases.
What's next?
You could go on to more fully specify invocation feature tasks. From there, you could figure out what optimizations are possible, and when they can be used. Perhaps other invocation features could be identified. You could also think of more examples of invocation styles. You could also explore the dependencies between invocation features. For instance, synchronous and asynchronous invocation involve explicitly coupling or uncoupling Continuation and Coordination. It never ends.
Get all that? I'm still trying to digest it myself.
References:
Hohpe, Gregor; "Event-Driven Architecture"
Sugalski, Dan; "CPS and tail calls--two great tastes that taste great together"

In this case it is possible to do tail-call optimization, since we don't need to do more work or make use of local variables. So the compiler will convert this into a loop.

(a goto in my opinion is a function call without the ability to return where you came from). Which is exactly what is happening in the erlang example.
That is not what's happening in Erlang, you CAN return to where you came from.
The calls are tail-recursive, which means that it is "sort of" a goto. Make sure you understand what tail-recursion is before you attempt to understand or write any code. Reading Joe Armstrong's book probably isn't a bad idea if you are new to Erlang.
Conceptually, in the case where you call yourself using test() then a call is made to the start of the function using whatever parameters you pass (none in this example) but nothing more is added to the stack. So all your variables are thrown away and the function starts fresh, but you didn't push a new return pointer onto the stack. So it's like a hybrid between a goto and a traditional imperative language style function call like you'd have in C or Java. But there IS still one entry on the stack from the very first call from the calling function. So when you eventually exit by returning a value rather the doing another test() then that return location is popped from the stack and execution resumes in your calling function.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex