What is the difference between futures::select! and tokio::select?

What is the difference between futures::select! and tokio::select? - asynchronous

I'm using Tokio and I want to receive requests from two different mpsc queues. select! seems like the way to go, but I'm not sure what the difference is between futures::select! and tokio::select!. Under which circumstances one should you use one over the other?

tokio::select! was built out of experiences with futures::select!, but improves a bit on it to make it more ergonomic. E.g. the futures-rs version of select! requires Futures to implement FusedFuture, whereas Tokio's version no longer requires this.
Instead of this, Tokio's version supports preconditions in the macro to cover the same use-cases.
The PR in the tokio repo elaborates a bit more on this.
This change was also proposed for the futures-rs version, but has not been implemented there so far.
If you already have Tokio included in your project, then using Tokio's version seems preferable. But if you have not and do not want to add an additional dependency, then the futures-rs version will cover most use-cases too in a nearly identical fashion. The main difference is that some Futures might need to be converted into FusedFutures through the FutureExt::fuse() extension method.

To complement #matthias247's answer, a related big difference is that futures::select! takes futures in branch expressions by mutable reference, so uncompleted futures can be re-used in a loop.
tokio::select!, on the other hand, consumes passed futures. To get behavior similar to futures::select! you need to explicitly pass a reference (e.g. &mut future), and pin it if necessary (e.g. if it is async fn). Tokio docs have a section on this, Resuming an async operation
This thread has an in-depth explanation of why Tokio decided not to use FusedFuture.

Related

Could you implement async-await by memcopying stack frames rather than creating state machines?

I am trying to understand all the low-level stuff Compilers / Interpreters / the Kernel do for you (because I'm yet another person who thinks they could design a language that's better than most others)
One of the many things that sparked my curiosity is Async-Await.
I've checked the under-the-hood implementation for a couple languages, including C# (the compiler generates the state machine from sugar code) and Rust (where the state machine has to be implemented manually from the Future trait), and they all implement Async-Await using state machines.
I've not found anything useful by googling ("async copy stack frame" and variations) or in the "Similar questions" section.
To me, this method seems rather complicated and overhead-heavy;
Could you not implement Async-Await by simply memcopying the stack frames of async calls to/from heap?
I'm aware that it is architecturally impossible for some languages (I thank the CLR can't do it, so C# can't either).
Am I missing something that makes this logically impossible? I would expect less complicated code and a performance boost from doing it that way, am I mistaken? I suppose when you have a deep stack hierarchy after a async call (eg. a recursive async function) the amount of data you would have to memcopy is rather large, but there are probably ways to work around that.
If this is possible, then why isn't it done anywhere?

Yes, an alternative to converting code into state machines is copying stacks around. This is the way that the go language does it now, and the way that Java will do it when Project Loom is released.
It's not an easy thing to do for real-world languages.
It doesn't work for C and C++, for example, because those languages let you make pointers to things on the stack. Those pointers can be used by other threads, so you can't move the stack away, and even if you could, you would have to copy it back into exactly the same place.
For the same reason, it doesn't work when your program calls out to the OS or native code and gets called back in the same thread, because there's a portion of the stack you don't control. In Java, project Loom's 'virtual threads' will not release the thread as long as there's native code on the stack.
Even in situations where you can move the stack, it requires dedicated support in the runtime environment. The stack can't just be copied into a byte array. It has to be copied off in a representation that allows the garbage collector to recognize all the pointers in it. If C# were to adopt this technique, for example, it would require significant extensions to the common language runtime, whereas implementing state machines can be accomplished entirely within the C# compiler.

I would first like to begin by saying that this answer is only meant to serve as a starting point to go in the actual direction of your exploration. This includes various pointers and building up on the work of various other authors
I've checked the under-the-hood implementation for a couple languages, including C# (the compiler generates the state machine from sugar code) and Rust (where the state machine has to be implemented manually from the Future trait), and they all implement Async-Await using state machines
You understood correctly that the Async/Await implementation for C# and Rust use state machines. Let us understand now as to why are those implementations chosen.
To put the general structure of stack frames in very simple terms, whatever we put inside a stack frame are temporary allocations which are not going to outlive the method which resulted in the addition of that stack frame (including, but not limited to local variables). It also contains the information of the continuation, ie. the address of the code that needs to be executed next (in other words, the control has to return to), within the context of the recently called method. If this is a case of synchronous execution, the methods are executed one after the other. In other words, the caller method is suspended until the called method finishes execution. This, from a stack perspective fits in intuitively. If we are done with the execution of a called method, the control is returned to the caller and the stack frame can be popped off. It is also cheap and efficient from a perspective of the hardware that is running this code as well (hardware is optimised for programming with stacks).
In the case of asynchronous code, the continuation of a method might have to trigger several other methods that might get called from within the continuation of callers. Take a look at this answer, where Eric Lippert outlines the entirety of how the stack works for an asynchronous flow. The problem with asynchronous flow is that, the method calls do not exactly form a stack and trying to handle them like pure stacks may get extremely complicated. As Eric says in the answer, that is why C# uses graph of heap-allocated tasks and delegates that represents a workflow.
However, if you consider languages like Go, the asynchrony is handled in a different way altogether. We have something called Goroutines and here is no need for await statements in Go. Each of these Goroutines are started on their own threads that are lightweight (each of them have their own stacks, which defaults to 8KB in size) and the synchronization between each of them is achieved through communication through channels. These lightweight threads are capable of waiting asynchronously for any read operation to be performed on the channel and suspend themselves. The earlier implementation in Go is done using the SplitStacks technique. This implementation had its own problems as listed out here and replaced by Contigious Stacks. The article also talks about the newer implementation.
One important thing to note here is that it is not just the complexity involved in handling the continuation between the tasks that contribute to the approach chosen to implement Async/Await, there are other factors like Garbage Collection that play a role. GC process should be as performant as possible. If we move stacks around, GC becomes inefficient because accessing an object then would require thread synchronization.
Could you not implement Async-Await by simply memcopying the stack frames of async calls to/from heap?
In short, you can. As this answer states here, Chicken Scheme uses a something similar to what you are exploring. It begins by allocating everything on the stack and move the stack values to heap when it becomes too large for the GC activities (Chicken Scheme uses Generational GC). However, there are certain caveats with this kind of implementation. Take a look at this FAQ of Chicken Scheme. There is also lot of academic research in this area (linked in the answer referred to in the beginning of the paragraph, which I shall summarise under further readings) that you may want to look at.
Further Reading
Continuation Passing Style
call-with-current-continuation
The classic SICP book
This answer (contains few links to academic research in this area)
TLDR
The decision of which approach to be taken is subjective to factors that affect the overall usability and performance of the language. State Machines are not the only way to implement the Async/Await functionality as done in C# and Rust. Few languages like Go implement a Contigious Stack approach coordinated over channels for asynchronous operations. Chicken Scheme allocates everything on the stack and moves the recent stack value to heap in case it becomes heavy for its GC algorithm's performance. Moving stacks around has its own set of implications that affect garbage collection negatively. Going through the research done in this space will help you understand the advancements and rationale behind each of the approaches. At the same time, you should also give a thought to how you are planning on designing/implementing the other parts of your language for it be anywhere close to be usable in terms of performance and overall usability.
PS: Given the length of this answer, will be happy to correct any inconsistencies that may have crept in.

I have been looking into various strategies for doing this myseøf, because I naturally thi k I can design a language better than anybody else - same as you. I just want to emphasize that when I say better, I actually mean better as in tastes better for my liking, and not objectively better.
I have come to a few different approaches, and to summarize: It really depends on many other design choices you have made in the language.
It is all about compromises; each approach has advantages and disadvantages.
It feels like the compiler design community are still very focused on garbage collection and minimizing memory waste, and perhaps there is room for some innovation for more lazy and less purist language designers given the vast resources available to modern computers?
How about not having a call stack at all?
It is possible to implement a language without using a call stack.
Pass continuations. The function currently running is responsible for keeping and resuming the state of the caller. Async/await and generators come naturally.
Preallocated static memory addresses for all local variables in all declared functions in the entire program. This approach causes other problems, of course.
If this is your design, then asymc functions seem trivial
Tree shaped stack
With a tree shaped stack, you can keep all stack frames until the function is completely done. It does not matter if you allow progress on any ancestor stack frame, as long as you let the async frame live on until it is no longer needed.
Linear stack
How about serializing the function state? It seems like a variant of continuations.
Independent stack frames on the heap
Simply treat invocations like you treat other pointers to any value on the heap.
All of the above are trivialized approaches, but one thing they have in common related to your question:
Just find a way to store any locals needed to resume the function. And don't forget to store the program counter in the stack frame as well.

Channels over promisses. Why and how to use?

I confess that I haven't study core.async yet. I.e. I don't know the clojure way to work asynchronously, but I know that is mostly using channels. I work mainly in clojurescript and I'm going to start writing a service worker.
I found this library to write promises as channels, but it feels there is not a lot of work to do without using the library or not.
So, should I use channels over promises in any situation?
Is there a simple convertion from promises to core.async using channels?

If you look over the original rational for core.async, it becomes clearer when it has advantages over using another thread such as with future. ClojureScript was one of the big drivers, since it is single-threaded and there is no other options.
Some resources:
https://clojure.org/news/2013/06/28/clojure-clore-async-channels
https://github.com/clojure/core.async/blob/master/examples/walkthrough.clj
https://cognitect.com/videos.html (2 on CLJS core.async)
https://github.com/cognitect/async-webinar
https://rigsomelight.com/drafts/clojurescript-core-async-todos.html
https://medium.com/#loganpowell/cljs-core-async-101-f6522faf536d

Intergrating both synchronous and asynchronous libraries

Can synchronous and asynchronous functions be integrated into one call/interface whilst maintaining static typing? If possible, can it remain neutral with inheritance, i.e. not wrapping sync methods in async or vice versa (though this might be the best way).
I've been reading around and see it's generally recommending to keep these separate (http://www.tagwith.com/question_61011_pattern-for-writing-synchronous-and-asynchronous-methods-in-libraries-and-keepin and Maintain both synchronous and asynchronous implementations). However, the reason I want to do this is I'm creating a behaviour tree framework for Dart language and am finding it hard to mix both sync and async 'nodes' together to iterate through. It seems these might need to be kept separate, meaning nodes that would suit a sync approach would have to be async, or the opposite, if they are to be within the same 'tree'.
I'm looking for a solution particularly for Dart lang, although I know this is firmly in the territory of general programming concepts. I'm open to this not being able to be achieved, but worth a shot.
Thank you for reading.

You can of course use sync and async functions together. What you can't do is go back to sync execution after a call of an async function.
Maintaining both sync and async methods is in my opinion mostly a waste of time. Sometimes sync versions are convenient to not to have to invoke an async call for some simple operation but in general Dart async is an integral part of Dart. If you want to use Dart you have to get used to it.
With the new async/await feature you can write code that uses async functions almost the same as when only sync functions are used.

How to use non-blocking or asynchronous IO with Boost Spirit?

Does Spirit provide any capabilities for working with non-blocking IO?
To provide a more concrete example: I'd like to use Boost's Spirit parsing framework to parse data coming in from a network socket that's been placed in non-blocking mode. If the data is not completely available, I'd like to be able to use that thread to perform other work instead of blocking.
The trivial answer is to simply read all the data before invoking Spirit, but potentially gigabytes of data would need to be received and parsed from the socket.
It seems like that in order to support non-blocking I/O while parsing, Spirit would need some ability to partially parse the data and be able to pause and save its parse state when no more data is available. Additionally, it would need to be able to resume parsing from the saved parse state when data does become available. Or maybe I'm making this too complicated?

TODO Will post a example for a simple single-threaded 'event-based' parsing model. This is largely trivial but might just be what you need.
For anything less trivial, please heed to following considerations/hints/tips:
How would you be consuming the result? You wouldn't have the synthesized attributes any earlier anyway, or are you intending to use semantic actions on the fly?
That doesn't usually work well due to backtracking. The caveats could be worked around by careful and judicious use of qi::hold, qi::locals and putting semantic actions with side-effects only at stations that will never be backtracked. In other words:
this is bound to be very errorprone
this naturally applies to a limited set of grammars only (those grammars with rich contextual information will not lend themselves well for this treatment).
Now, everything can be forced, of course, but in general, experienced programmers should have learned to avoid swimming upstream.
Now, if you still want to do this:
You should be able to get spirit library thread safe / reentrant by defining BOOST_SPIRIT_THREADSAFE and linking to libboost_thread. Note this makes the gobals used by Spirit threadsafe (at the cost of fine grained locking) but not your parsers: you can't share your own parsers/rules/sub grammars/expressions across threads. In fact, you can only share you own (Phoenix/Fusion) functors iff they are threadsafe, and any other extensions defined outside the core Spirit library should be audited for thread-safety.
If you manage the above, I think by far the best approach would seem to
use boost::spirit::istream_iterator (or, for binary/raw character streams I'd prefer to define a similar boost::spirit::istreambuf_iterator using the boost::spirit::multi_pass<> template class) to consume the input. Note that depending on your grammar, quite a bit of memory could be used for buffering and the performance is suboptimal
run the parser on it's own thread (or logical thread, e.g. Boost Asio 'strands' or its famous 'stackless coprocedures')
use coarse-grained semantic actions like shown above to pass messages to another logical thread that does the actual processing.
Some more loose pointers:
you can easily 'fuse' some functions to handle lazy evaluation of your semantic action handlers using BOOST_FUSION_ADAPT_FUNCTION and friends; This reduces the amount of cruft you have to write to get simple things working like normal C++ overload resolution in semantic actions - especially when you're not using C++0X and BOOST_RESULT_OF_USE_DECLTYPE
Because you will want to avoid semantic actions with side-effects, you should probably look at Inherited Attributes and qi::locals<> to coordinate state across rules in 'pure functional fashion'.

Why shouldn't I use F# asynchronous workflows for parallelism?

I have been learning F# recently, being particularly interested in its ease of exploiting data parallelism. The data |> Array.map |> Async.Parallel |> Async.RunSynchronously idiom seems very easy to understand and straightforward to use and get real value from.
So why is it that async is not really intended for this? Donald Syme himself says that PLINQ and Futures are probably a better choice. And other answers I've read here agree with that as well as recommending TPL. (PLINQ doesn't seem too much different to the above built-in functions, as long as you're using the F# Powerpack to get the PSeq functions.)
F# and functional languages make a lot of sense for this, and some applications have achieved great success with async parallelism.
So why shouldn't I use async to execute parallel data processes? What am I going to lose by writing parallel async code instead of using PLINQ or TPL?

So why shouldn't I use async to execute parallel data processes?
If you have a tiny number of completely independent non-async tasks and lots of cores then there is nothing wrong with using async to achieve parallelism. However, if your tasks are dependent in any way or you have more tasks than cores or you push the use of async too far into the code then you will be leaving a lot of performance on the table and could do a lot better by choosing a more appropriate foundation for parallel programming.
Note that your example can be written even more elegantly using the TPL from F# though:
Array.Parallel.map f xs
What am I going to lose by writing parallel async code instead of using PLINQ or TPL?
You lose the ability to write cache oblivious code and, consequently, will suffer from lots of cache misses and, therefore, all cores stalling waiting for shared memory which means poor scalability on a multicore.
The TPL is built upon the idea that child tasks should execute on the same core as their parent with a high probability and, therefore, will benefit from reusing the same data because it will be hot in the local CPU cache. There is no such assurance with async.

I wrote an article that re-implements one C# TPL sample using both Task and Async, which also has some comments on the difference between the two. You can find it here and there is also a more advanced async-based version.
Here is a quote from the first article that compares the two options:
The choice between the two possible implementations depends on many factors. Asynchronous workflows were designed specifically for F#, so they more naturally fit with the language. They offer better performance for I/O bound tasks and provide more convenient exception handling. Moreover, the sequential syntax is quite convenient. On the other hand, tasks are optimized for CPU bound calculations and make it easier to access the result of calculation from other places of the application without explicit caching.

I always figured it's what TPL, PLinq etc... give you over and above what Async does. (Cancellation mechanisms is the one that comes to mind.) This question has some better answers.
This article hints at a slight performance advantage to TPL, but probably not enough to be significant.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex