How Datastax implements its async API driver for Cassandra? - asynchronous

I'm trying to convince a coworker of the benefits of using the Session#executeAsync.
However, since we are using the driver from Scala, it would be rather easy to wrap the sync call Session#execute in a Future and that would be all to transform it in an async call. This will be already an improvement because it will give us the opportunity of avoid blocking the current thread (in our case that would represent blocking the threads that handles http requests in play with a huge impact on the number of requests that can be handled concurrently)
I argue that if the work needed to implement an async driver will be wrap it in a Future it won't exist implementations like ReactiveMongo an the Async Api for Cassandra from Datastax.
So,
What are the benefits of using the async api?
How is the async api implemented in Datastax driver and it what libraries and OS features relies on?
What kind of problems were to be solved beyond the asynchronous networks calls? (I mean, implement the async driver must be more than just using java nio)

How is the async api implemented in Datastax driver and it what libraries and OS features relies on?
Datastax java driver based on Netty networking framework. Netty itself based on Event Driven model. Also for some operating systems Netty provides native transports to improve performance e.g. epoll for Linux.
What are the benefits of using the async api?
I'm not a Scala expert but as I know Scala Future based on Threads model (Execution contexts). It means you need to submit a request to another thread to execute the request asynchronously. For IO tasks you just need request another system and wait response from this system. If you have a big number of requests, all threads in your pool will be busy but will not do anything useful. Thread is a fairly expensive resource and it can be a problem to have thousands threads in the same physical resource. Threads are good for parallel calculation tasks but not for IO tasks.
From other hand Datastax java driver based on Event Driven model (Netty). It means the each request will be submitted in event loop queue. For each iteration of event loop, Netty will define the state of request and will execute handlers associated with this request.
This approach avoids of memory usage overhead for threads and allows you to perform thousands of IO requests in the same time. But in this case you should define slow or blocking request callbacks in another thread to avoid blocking of event-loop.

Related

async await advantages when we have enough threads

I understood that .net know to use multiple threads for multiple requests.
So, if probably our service wont get more request than the number of threads our server can produce (it look like huge number), the only reason I can see to use async is on single request that do multiple blocking operations which can done in parallel.
Am I right?
Another advantage may be that serve multiple requests with same thread is cheaper than use multiple threads. How significant is this difference?
(note: no UI exists in our service (I saw that there is single thread for this, but it isn't relevant))
thanks!
Am I right?
No, doing multiple independent blocking operations, is the job of Concurrent APIs anyway (though sometimes they need Synchronization (like lock, mutex) to maintain the object state and avoid Race condition), but the usage of Async-Await is to schedule the IO Operations, like File Read / Write, call a remote service or Database Read / Write, which doesn't need a thread, as they are queued on a queue in hardware called IO Completion ports.
Benefits of Async-Await:
Doesn't start a IO operation on a separate Thread, since Thread is a costly resource, in terms memory and resource allocation and would do little precious than wait for IO call to come back. Separate thread shall be used for the compute bound operations, no IO bound.
Free up the UI / caller thread to make it completely responsive to carry out other tasks / operations
This is the evolution of Asynchronous programming model (BeginXX, EndXX), which was fairly complex to understand and implement
Another advantage may be that serve multiple requests with same thread is cheaper than use multiple threads. How significant is this difference?
Its a good strategy depending on the kind of request from caller, if they are compute bound better invoke a Parallel API and finish them fast, IO bound there's Async-Await, only issue with multiple threads is Resource allocation and Context switching, which needs to be factored in, but on other end it efficiently utilize the processor cores, which are fairly under utilized in the current day systems, as you would see most of the time processor is lying idle

using Message oriented middleware for communications within single web application realm

I wanted to check the viability of the design approach to use Message Oriented middle-ware (MOM) technology like JMS or ActiveMQ or RabbitMQ for handling asynchronous processing within single web application i.e. the publisher and the subscriber to the MOM server will be contained in the same web application.
The rationale behind this design is to offload some of the heavy duty processing functionality as a background asynchronous operation. The publisher in this case is the server side real-time web service method which need to respond back instantaneously (< than 1 sec) to the calling web service client and the publisher emits the message on MOM Topic. The subscriber is contained in the same web application as the publisher and the subscriber uses the message to asynchronously processes the complex slightly more time consuming (5-7 seconds) functionality.
With this design we can avoid having to spawn new threads within the application server container for handling the heavy duty complex processing functionality.
Does using MOM server in this case an overkill where the message publisher and message subscriber are contained in the same web server address space? From what I have read MOM tech is used mainly for inter-application communication and wanted to check if it is fine to use MOM for intra-application communication.
Let know your thoughts.
Thanks,
Perhaps you will not think it is a good example but in the JEE world using JMS for intra-application communication is quite common. Spawning new threads is considered a bad practive and message-driven beans make consuming messages relatively easy and you get transaction support. A compliant application server like GlassFish has JMS on board so production and consumption of messages does not involve socket communication as will be the case with a standalone ActiveMQ. But there might be reasons to have a standalone JMS, e.g. if there is a cluster of consumers and you want the active instances to take over work from the failed ones... but then the standalone JMS server becomes the single point of failure and now you want a cluster of them and so on.
One significant feature of JMS is (optional) message persistence. You may be concerned that the long-running task fails for some reason and the client's request will be lost. But persistent messages are much more expensive as they cause disk IO.
From what you've described I can tell that of the usual features of MOM (asynchronous processing, guaranteed delivery, order of messages) you only need asynchronous processing. So if guarantees are not important I would use some kind of a thread pool.

How to best implement a blocking/waiting actor?

I'm fairly new to Akka and writing concurrent applications and I'm wondering what's a good way to implement an actor that would wait for a redis list and once an item becomes available it will process it, or send it to a different actor to process?
Would using the blocking function BRPOPLPUSH be better, or would a scheduler that will ask the actor to poll redis every second be a better way?
Also, on a normal system, how many of these actors can I spawn concurrently without consuming all the resource the system has to offer? How does one decide how many of each Actor type should an actor system be able to handle on the system its running on?
As a rule of thumb you should never block inside receive. Each actor should rely only on CPU and never wait, sleep or block on I/O. When these conditions are met you can create even millions of actors working concurrently. Each actor is suppose to have 600-650 bytes memory footprint (see: Concurrency, Scalability & Fault-tolerance 2.0 with Akka Actors & STM).
Back to your main question. Unfortunately there is no official Redis client "compatible" with Akka philosophy, that is, completely asynchronous. What you need is a client that instead of blocking will return you a Future object of some sort and allow you to register callback when results are available. There are such clients e.g. for Perl and node.js.
However I found fyrie-redis independent project which you might find useful. If you are bound to synchronous client, the best you can do is either:
poll Redis periodically without blocking and inform some actor by sending a message to with a Redis reply or
block inside an actor and understand the consequences
See also
Redis client library recommendations for use from Scala
BRPOPLPUSH with block for long time (up to the timeout you specify), so I would favour a Scheduler instead which still blocks, but for a shorter amount of time every second or so.
Whichever way you go, because you are blocking, you should read this section of the Akka docs which describes methods for working with blocking libraries.
Do you you have control over the code that is inserting the item into redis? If so you could get that code to send your akka code a message (maybe over ActiveMQ using the akka camel support) to notify it when the item has been inserted into redis. This will be a more event driven way of working and prevent you from having to poll, or block for super long periods of time.

node.js vs. asp.net async pages

still trying to understnad node.js...
If I apply the asp.net async pattern for every i/o operation, and configure maxWorkerThreads=1, is it (conceptually) similar to node.js?
Does an i/o operation (in either framework) takes place in its own thread or is there some OS functionality to get notifications / light thread?
this SO thread says that node.js still uses threads internally so it is not such a big difference from asp.net. Some answers say that yes, but it is a better programming model etc. Which threads does the question refers to, lightweight i/o like the ones I asked on in #2?
See this similar question
As for the i/o operations that's implementation specific. the linux backend uses libev and the windows backend uses IOCP. See this video on async i/o details for windows/linux
node.js only uses threads internally because linux doesn't have an async IO system (like windows does with IOCP). So to make async IO possible you need an internal thread pool. See the video.

Multithreaded comet server library

I'm looking for multithreaded comet server library - what I need is async io (using epoll) working on a threadpool (4-8 threads). Tornado would be ideal if it was multithreaded.
Why multithreaded? I need to process and serve data which could come from every connected user - it could be synchronised between tornado instances using database but even nosql would be too big slowdown - almost every request would end up with database write/update - which even by using async drivers isn't a good idea. I can store everything in local volataile memory so it can be very fast - but must be run on single process to avoid inter-process communication. I don't need to scale - single box is enough - but it MUST be fast. Some data will be stored in MongoDB - but number of mongo queries will be like 5% of normal requests.
And important thing - semaphores (and other higher level approaches) are not rocket science for me so I'm not afraid of synchronisation.
Requirements:
async io
non-blocking
thousands of concurrent connections
FAST
basic HTTP features (GET, POST, cookies)
ability to process request asynchronously (do something, async call with callback (ex. database query), process callback, return data)
thread pool
C++/Java/Python
simple and lightweight
It would be nice to have async mongo driver too
I've looked into Boost ASIO and it seems to be capable of doing what I need - but I want to focus on application - not writing http request processing.
I've read about Tornado (seems ideal but is single threaded), Simple (not sure if it can process request asynchronously and return data after async call), BOOST ASIO (very nice, but too low-level)
Well, after more digging I decided to change technology... I decided to create my own protocol on top of TCP and Netty

Resources