The asynchronous connection pool implementation in Rust - asynchronous

I have a Tokio TCP back-end application, which, briefly, after receiving a request, reads something from Redis, writes something to PostgreSQL, uploads something via HTTP, sends something to RabbitMQ etc. Processing each request takes a lot of time, so a separate task for each request is created. As sharing connections is impossible in asynchronous models, some connection pooling is required. For now, new connections are established on each request, and it is extremely excessive.
I have been looking for an asynchronous connection pool implementation in Rust, but have not found any of them up to date.
I would like to hear some advice on how to implement it myself.
The only idea I have come up with is:
Implement a Stream/Sink object with an inner collection of connections. It does not matter whether it is LIFO or FIFO, since the connections are identical. On the application startup, N connections are allocated.
Now I am not sure if it is possible to share such a pool among tasks, but if it were possible, tasks would poll the stream for a connection instance (instead of establishing their own one), use it, and then put back.
If there were no connections available, the stream might establish more of them or ask the task to hang on (depending on its configuration).
If a connection fails, it gets dropped and the pool now contains N-1 connections, so it may decide to allocate a new one on the next request.
So I have two problems I cannot find proper answers anywhere:
Must/can/should I share the stream/sink-pool among tasks in some way? Anyway, I see some Shared futures in the futures crate.
There are some gloomy points in the tokio/futures tutorial. E.g. it does not explain how do I notify the uppermost task, that is, how do I implement the mythical innermost future, which does not pool anything itself, but still has to notify the upper futures.
Or is my approach completely wrong? I could start playing with it by myself, but I have a strong suspicion that I have missed something, e.g. a one-click solution.

Related

How do I create a memory bound message queue in Erlang?

I want the speed of asynchronous messages but still have some flow control. How can I accomplish this in Erlang?
There is no process memory limit right now -- it is discussed on mailing list etc. You can look at those threads.
On the up side, when you use OTP patterns implementation like gen_server you have a lot of freedom in retrieving messages from process queue and measuring the length of the queue.
gen_server2 used in rabbitmq used to optimize that by moving messages to internal data structure.
Having that you can discard any new incoming message when internal queue is too long.
You can do it silently or notify sender that the message rejected.
All of that is on very low level.
RabbitMQ will provide this functionality on AMQP level.
A common and quite good way of enforcing flow control is to make well selected messages into calls which limits how much load each client can load the server to one, effectively providing force feed back in an extremely simple way. The trick is of course to pick which communications uses synchronous calls :-)

QuickFix using Pipes, shared memory, message queues etc

Here's my scenario:
In my application i have several processes which communicate with each other using Quickfix which internally use tcp sockets.the flow is like:
Process1 sends quickfix messaage-> process 2 sends quickfix message after processing message from
process 1 -> .....->process n
Similarly the acknowledgement messages flow like,
process n->....->process 1
Now, All of these processes except the last process( process n ) are on the same machine.
I googled and found that tcp sockets are the slowest of ipc mechanisms.
So, is there a way to transmit and recieve quick fix messages( obviously using their apis)
through other ipc mechanisms. If yes, i can then reduce the latency by using that ipc mechanism between all the processes which are on the same machine.
However if i do so, do those mechanisms guarentee the tranmission of complete message like tcp sockets do?
I think you are doing premature optimization, and I don't think that TCP will be your performance bottleneck. Your local LAN latency will be faster than that of your exterior FIX connection. From experience, I'd expect perf issues to originate in your app's message handling (perhaps due to accidental blocking in OnMessage() callbacks) rather than the IPC stuff going on afterward.
Advice: Write your communication component with an abstraction-layer interface so that later down the line you can swap out TCP for something else (e.g ActiveMQ, ZeroMQ, whatever else you may consider) if you decide you may need it.
Aside from that, just focus on making your system work correctly. Once you are sure teh behavior are correct (hopefully with tests to confirm them), then you can work on performance. Measure your performance before making any optimizations, and then measure again after you make "improvements". Don't trust your gut; get numbers.
Although it would be good to hear more details about the requirements associated with this question, I'd suggest looking at a shared memory solution. I'm assuming that you are running a server in a colocated facility with the trade matching engine and using high speed, kernel bypass communication for external communications. One of the issues with TCP is the user/kernel space transitions. I'd recommend considering user space shared memory for IPC and use a busy polling technique for synchronization rather than using synchronization mechanisms that might also involve kernel transitions.

How to best implement a blocking/waiting actor?

I'm fairly new to Akka and writing concurrent applications and I'm wondering what's a good way to implement an actor that would wait for a redis list and once an item becomes available it will process it, or send it to a different actor to process?
Would using the blocking function BRPOPLPUSH be better, or would a scheduler that will ask the actor to poll redis every second be a better way?
Also, on a normal system, how many of these actors can I spawn concurrently without consuming all the resource the system has to offer? How does one decide how many of each Actor type should an actor system be able to handle on the system its running on?
As a rule of thumb you should never block inside receive. Each actor should rely only on CPU and never wait, sleep or block on I/O. When these conditions are met you can create even millions of actors working concurrently. Each actor is suppose to have 600-650 bytes memory footprint (see: Concurrency, Scalability & Fault-tolerance 2.0 with Akka Actors & STM).
Back to your main question. Unfortunately there is no official Redis client "compatible" with Akka philosophy, that is, completely asynchronous. What you need is a client that instead of blocking will return you a Future object of some sort and allow you to register callback when results are available. There are such clients e.g. for Perl and node.js.
However I found fyrie-redis independent project which you might find useful. If you are bound to synchronous client, the best you can do is either:
poll Redis periodically without blocking and inform some actor by sending a message to with a Redis reply or
block inside an actor and understand the consequences
See also
Redis client library recommendations for use from Scala
BRPOPLPUSH with block for long time (up to the timeout you specify), so I would favour a Scheduler instead which still blocks, but for a shorter amount of time every second or so.
Whichever way you go, because you are blocking, you should read this section of the Akka docs which describes methods for working with blocking libraries.
Do you you have control over the code that is inserting the item into redis? If so you could get that code to send your akka code a message (maybe over ActiveMQ using the akka camel support) to notify it when the item has been inserted into redis. This will be a more event driven way of working and prevent you from having to poll, or block for super long periods of time.

{ ProcessName, NodeName } ! Message VS rpc:call/4 VS HTTP/1.1 across Erlang Nodes

I have a setup in which two nodes are going to be communicating a lot. On Node A, there are going to be thousands of processes, which are meant to access services on Node B. There is going to be a massive load of requests and responses across the two nodes. The two Nodes, will be running on two different servers, each on its own hardware server.
I have 3 Options: HTTP/1.1 , rpc:call/4 and Directly sending a message to a registered gen_server on Node B. Let me explain each option.
HTTP/1.1 Suppose that on Node A, i have an HTTP Client like Ibrowse, and on Node B, i have a web server like Yaws-1.95, the web server being able to handle unlimited connections, the operating system settings tweaked to allow yaws to handle all connections. And then make my processes on Node A to communicate using HTTP. In this case each method call, would mean a single HTTP request and a reply. I believe there is an overhead here, but we are evaluating options here. The erlang Built in mechanism called webtool, may be built for this kind of purpose.
rpc:call/4 I could simply make direct rpc calls from Node A to Node B. I am not very susre how the underlying rpc mechanism works , but i think that when two erlang nodes connect via net_adm:ping/1, the created connection is not closed but all rpc calls use this pipe to transmit requests and pass responses. Please correct me on this one.Sending a Message from Node A to Node B I could make my processes on Node A to just send message to a registered process, or a group of processes on Node B. This too seems a clean option.
Q1. Which of the above options would you recommend and why, for an application in which two erlang nodes are going to have enormous communications between them all the time. Imagine a messaging system, in which two erlang nodes are the routers :) ? Q2. Which of the above methods is cleaner, less problematic and is more fault tolerant (i mean here that, the method should NOT have single point of failure, that could lead to all processes on Node A blind) ? Q3. The mechanism of your choice: how would you make it even more fault tolerant, or redundant? Assumptions: The Nodes are always alive and will never go down, the network connection between the nodes will always be available and non-congested (dedicated to the two nodes only) , the operating system have allocated maximum resources to these two nodes. Thank you for your evaluations
HTTP is definitely out. Just the round-trip overhead of creating a new connection is a problem.
As for Erlang connections and using Pids, you have the advantage that you can subscribe to node-down messages and handle the case where a node goes down. A single TCP connection should be able to give you very fast speeds, however, be aware that it works like a long pipe: messages are muxed and demuxed on a pipe which can affect latency on the line. It also means that large messages will block small messages from getting through.
How much bandwidth are you aiming for, and at what latency? What is the 95th and 99th percentile of answering messages? It is better to put up some rough numbers and then try to target these than just having "as fast as possible". Set your success criteria first.
Q1: HTTP will add additional overhead and will give you nothing in my opinion. HTTP would be useful if you were designing a REST API. Directly sending messages and rpc:call look about the same as far as overhead is regarded.
Q2: Sending messages is much much clearer. It's the way erlang is designed. With RPC calls you must always track which call is executed where and under which circumstances which can be a huge issue if the two servers have state. Also RPC calls are synchronous.
Q3: I would use UBF if I can afford minor overhead, otherwise I would directly send messages between the erlang nodes. If the bandwidth is an issue other trickery would be needed as well. Like encoding the messages in same way and then using some compression algorithm to reduce the size of the message, alternatively I may ditch the erlang message passing altogether and use UDP sockets.
It is not obvious that ! is the best way to go. Definitely, it is the easiest and the code will be the most elegant.
In terms of scalability, take under consideration that to use rpc/! you have to maintain an erlang cluster. I found it painful having just 10-20 nodes even in private cloud. I would never recommend bigger deployments on e.g. EC2, where io/latency/network is not deterministic.
I recommend to structure the project in a way that will let you exchange communication engine in the future. Also HTTP is pretty heavy, but there are options:
socket-socket (tcp/udp/sctp)
amqp (many benefits connected to load balancing)
zeromq (even nicer than amqp)
Betting on !/rpc and OTP cluster is risky. You will fight with full mesh overhead, master election algos and quorum/partition detection.

Event Loop vs Multithread blocking IO

I was reading a comment about server architecture.
http://news.ycombinator.com/item?id=520077
In this comment, the person says 3 things:
The event loop, time and again, has been shown to truly shine for a high number of low activity connections.
In comparison, a blocking IO model with threads or processes has been shown, time and again, to cut down latency on a per-request basis compared to an event loop.
On a lightly loaded system the difference is indistinguishable. Under load, most event loops choose to slow down, most blocking models choose to shed load.
Are any of these true?
And also another article here titled "Why Events Are A Bad Idea (for High-concurrency Servers)"
http://www.usenix.org/events/hotos03/tech/vonbehren.html
Typically, if the application is expected to handle million of connections, you can combine multi-threaded paradigm with event-based.
First, spawn as N threads where N == number of cores/processors on your machine. Each thread will have a list of asynchronous sockets that it's supposed to handle.
Then, for each new connection from the acceptor, "load-balance" the new socket to the thread with the fewest socket.
Within each thread, use event-based model for all the sockets, so that each thread can actually handle multiple sockets "simultaneously."
With this approach,
You never spawn a million threads. You just have as many as as your system can handle.
You utilize event-based on multicore as opposed to a single core.
Not sure what you mean by "low activity", but I believe the major factor would be how much you actually need to do to handle each request. Assuming a single-threaded event-loop, no other clients would get their requests handled while you handled the current request. If you need to do a lot of stuff to handle each request ("lots" meaning something that takes significant CPU and/or time), and assuming your machine actually is able to multitask efficiently (that taking time does not mean waiting for a shared resource, like a single CPU machine or similar), you would get better performance by multitasking. Multitasking could be a multithreaded blocking model, but it could also be a single-tasking event loop collecting incoming requests, farming them out to a multithreaded worker factory that would handle those in turn (through multitasking) and sending you a response ASAP.
I don't believe slow connections with the clients matter that much, as I would believe the OS would handle that efficiently outside of your app (assuming you do not block the event-loop for multiple roundtrips with the client that initially initiated the request), but I haven't tested this myself.

Resources