libuv vs asyncio (python)

libuv vs asyncio (python) - asynchronous

I have been trying to find the difference in implementation of the uvloop and inbuilt asyncio that comes up with python. Apart from the fact that libuv the base of uvloop is written in c++, there is no other factor that is mentioned in the web. I would like to know about the other factors that affect the asyncio [erfomance between them.
Also on a side-note this blog consists of performance difference stream and normal async io, isn't stream generated from the asyncio and thus dependent on each other?

As you said, uvloop is written in Cython (equivalent to c) on top of libuv.
Writing code in Cython is almost guaranteed to give you a noticeable speed boost which is exactly what's happening here. No need for any other difference. It's much like numpy doing operations faster than writing normally in Python.
For your other question: The difference between asyncio and asyncio-streams is that streams are built on top of the basic asyncio.
Asyncio uses transports and protocols, the first responsible for writing to the socket, and the second for handling data received by the socket.
Streams are simple constructs built on top of both, and have an easier to use interface that mimics regular files or sockets.

Related

What is the difference between futures::select! and tokio::select?

I'm using Tokio and I want to receive requests from two different mpsc queues. select! seems like the way to go, but I'm not sure what the difference is between futures::select! and tokio::select!. Under which circumstances one should you use one over the other?

tokio::select! was built out of experiences with futures::select!, but improves a bit on it to make it more ergonomic. E.g. the futures-rs version of select! requires Futures to implement FusedFuture, whereas Tokio's version no longer requires this.
Instead of this, Tokio's version supports preconditions in the macro to cover the same use-cases.
The PR in the tokio repo elaborates a bit more on this.
This change was also proposed for the futures-rs version, but has not been implemented there so far.
If you already have Tokio included in your project, then using Tokio's version seems preferable. But if you have not and do not want to add an additional dependency, then the futures-rs version will cover most use-cases too in a nearly identical fashion. The main difference is that some Futures might need to be converted into FusedFutures through the FutureExt::fuse() extension method.

To complement #matthias247's answer, a related big difference is that futures::select! takes futures in branch expressions by mutable reference, so uncompleted futures can be re-used in a loop.
tokio::select!, on the other hand, consumes passed futures. To get behavior similar to futures::select! you need to explicitly pass a reference (e.g. &mut future), and pin it if necessary (e.g. if it is async fn). Tokio docs have a section on this, Resuming an async operation
This thread has an in-depth explanation of why Tokio decided not to use FusedFuture.

Channels over promisses. Why and how to use?

I confess that I haven't study core.async yet. I.e. I don't know the clojure way to work asynchronously, but I know that is mostly using channels. I work mainly in clojurescript and I'm going to start writing a service worker.
I found this library to write promises as channels, but it feels there is not a lot of work to do without using the library or not.
So, should I use channels over promises in any situation?
Is there a simple convertion from promises to core.async using channels?

If you look over the original rational for core.async, it becomes clearer when it has advantages over using another thread such as with future. ClojureScript was one of the big drivers, since it is single-threaded and there is no other options.
Some resources:
https://clojure.org/news/2013/06/28/clojure-clore-async-channels
https://github.com/clojure/core.async/blob/master/examples/walkthrough.clj
https://cognitect.com/videos.html (2 on CLJS core.async)
https://github.com/cognitect/async-webinar
https://rigsomelight.com/drafts/clojurescript-core-async-todos.html
https://medium.com/#loganpowell/cljs-core-async-101-f6522faf536d

How to use non-blocking point-to-point MPI routines instead of collectives

In my programm, I would like to heavily parallelize many mathematical calculations, the results of which are then written to an output file.
I successfully implemented that using collective communication (gather, scatter etc.) but I noticed that using these synchronizing routines, the slowest among all processors dominates the execution time and heavily reduces overall computation time, as fast processors spend a lot of time waiting.
So I decided to switch to the scheme, where one (master) processor is dedicated to receiving chunks of results and handling the file output, and alle the other processors calculate these results and send them to the master using non-blocking send routines.
Unfortunately, I don't really know how to implement the master code; Do I need to run an infinite loop with MPI_Recv(), listening for incoming messages? How do I know when to stop the loop? Can I combine MPI_Isend() and MPI_Recv(), or do both method need to be non-blocking? How is this typically done?

MPI 3.1 provides non-blocking collectives. I would strongly recommend that instead of implementing it on your own.
However, it may not help you after all. Eventually you need the data from all processes, even the slow ones. So you are likely to wait at some point again. Non-blocking communication overlaps communication and computation, but it doesn't fix your load imbalances.
Update (more or less a long clarification comment)
There are several layers to your question, I might have been confused by the title as to what kind of answer you were expecting. Maybe the question is rather
How do I implement a centralized work queue in MPI?
This pops up regularly, most recently here. But that is actually often undesirable because a central component quickly becomes a bottleneck in large scale programs. So the actual problem you have, is that your work decomposition & mapping is imbalanced. So the more fundamental "X-question" is
How do I load balance an MPI application?
At that point you must provide more information about your mathematical problem and it's current implementation. Preferably in form of an [mcve]. Again, there is no standard solution. Load balancing is a huge research area. It may even be a topic for CS.SE rather than SO.

How to use non-blocking or asynchronous IO with Boost Spirit?

Does Spirit provide any capabilities for working with non-blocking IO?
To provide a more concrete example: I'd like to use Boost's Spirit parsing framework to parse data coming in from a network socket that's been placed in non-blocking mode. If the data is not completely available, I'd like to be able to use that thread to perform other work instead of blocking.
The trivial answer is to simply read all the data before invoking Spirit, but potentially gigabytes of data would need to be received and parsed from the socket.
It seems like that in order to support non-blocking I/O while parsing, Spirit would need some ability to partially parse the data and be able to pause and save its parse state when no more data is available. Additionally, it would need to be able to resume parsing from the saved parse state when data does become available. Or maybe I'm making this too complicated?

TODO Will post a example for a simple single-threaded 'event-based' parsing model. This is largely trivial but might just be what you need.
For anything less trivial, please heed to following considerations/hints/tips:
How would you be consuming the result? You wouldn't have the synthesized attributes any earlier anyway, or are you intending to use semantic actions on the fly?
That doesn't usually work well due to backtracking. The caveats could be worked around by careful and judicious use of qi::hold, qi::locals and putting semantic actions with side-effects only at stations that will never be backtracked. In other words:
this is bound to be very errorprone
this naturally applies to a limited set of grammars only (those grammars with rich contextual information will not lend themselves well for this treatment).
Now, everything can be forced, of course, but in general, experienced programmers should have learned to avoid swimming upstream.
Now, if you still want to do this:
You should be able to get spirit library thread safe / reentrant by defining BOOST_SPIRIT_THREADSAFE and linking to libboost_thread. Note this makes the gobals used by Spirit threadsafe (at the cost of fine grained locking) but not your parsers: you can't share your own parsers/rules/sub grammars/expressions across threads. In fact, you can only share you own (Phoenix/Fusion) functors iff they are threadsafe, and any other extensions defined outside the core Spirit library should be audited for thread-safety.
If you manage the above, I think by far the best approach would seem to
use boost::spirit::istream_iterator (or, for binary/raw character streams I'd prefer to define a similar boost::spirit::istreambuf_iterator using the boost::spirit::multi_pass<> template class) to consume the input. Note that depending on your grammar, quite a bit of memory could be used for buffering and the performance is suboptimal
run the parser on it's own thread (or logical thread, e.g. Boost Asio 'strands' or its famous 'stackless coprocedures')
use coarse-grained semantic actions like shown above to pass messages to another logical thread that does the actual processing.
Some more loose pointers:
you can easily 'fuse' some functions to handle lazy evaluation of your semantic action handlers using BOOST_FUSION_ADAPT_FUNCTION and friends; This reduces the amount of cruft you have to write to get simple things working like normal C++ overload resolution in semantic actions - especially when you're not using C++0X and BOOST_RESULT_OF_USE_DECLTYPE
Because you will want to avoid semantic actions with side-effects, you should probably look at Inherited Attributes and qi::locals<> to coordinate state across rules in 'pure functional fashion'.

Why shouldn't I use F# asynchronous workflows for parallelism?

I have been learning F# recently, being particularly interested in its ease of exploiting data parallelism. The data |> Array.map |> Async.Parallel |> Async.RunSynchronously idiom seems very easy to understand and straightforward to use and get real value from.
So why is it that async is not really intended for this? Donald Syme himself says that PLINQ and Futures are probably a better choice. And other answers I've read here agree with that as well as recommending TPL. (PLINQ doesn't seem too much different to the above built-in functions, as long as you're using the F# Powerpack to get the PSeq functions.)
F# and functional languages make a lot of sense for this, and some applications have achieved great success with async parallelism.
So why shouldn't I use async to execute parallel data processes? What am I going to lose by writing parallel async code instead of using PLINQ or TPL?

So why shouldn't I use async to execute parallel data processes?
If you have a tiny number of completely independent non-async tasks and lots of cores then there is nothing wrong with using async to achieve parallelism. However, if your tasks are dependent in any way or you have more tasks than cores or you push the use of async too far into the code then you will be leaving a lot of performance on the table and could do a lot better by choosing a more appropriate foundation for parallel programming.
Note that your example can be written even more elegantly using the TPL from F# though:
Array.Parallel.map f xs
What am I going to lose by writing parallel async code instead of using PLINQ or TPL?
You lose the ability to write cache oblivious code and, consequently, will suffer from lots of cache misses and, therefore, all cores stalling waiting for shared memory which means poor scalability on a multicore.
The TPL is built upon the idea that child tasks should execute on the same core as their parent with a high probability and, therefore, will benefit from reusing the same data because it will be hot in the local CPU cache. There is no such assurance with async.

I wrote an article that re-implements one C# TPL sample using both Task and Async, which also has some comments on the difference between the two. You can find it here and there is also a more advanced async-based version.
Here is a quote from the first article that compares the two options:
The choice between the two possible implementations depends on many factors. Asynchronous workflows were designed specifically for F#, so they more naturally fit with the language. They offer better performance for I/O bound tasks and provide more convenient exception handling. Moreover, the sequential syntax is quite convenient. On the other hand, tasks are optimized for CPU bound calculations and make it easier to access the result of calculation from other places of the application without explicit caching.

I always figured it's what TPL, PLinq etc... give you over and above what Async does. (Cancellation mechanisms is the one that comes to mind.) This question has some better answers.
This article hints at a slight performance advantage to TPL, but probably not enough to be significant.