Queue system recommendation approach - asynchronous

We have a bus reservation system running in GKE in which we are handling the creation of such reservations with different threads. Due to that, CRUD java methods can sometimes run simultaneously referring to the same bus, resulting in the save in our DB of the LAST simultaneous update only (so the other simultaneous updates are lost).
Even if the probabilities are low (the simultaneous updates need to be really close, 1-2 seconds), we need to avoid this. My question is about how to address the solution:
Lock the bus object and return error to the other simultaneous requests
In-memory map or Redis caché to track the bus requests
Use GCP Pub/Sub, Kafka or RabbitMQ as a queue system.
Try to focus the efforts on reducing the simultaneous time window (reduce from 1-2 seconds up to milliseconds)
Others?
Also, we are worried if in the future the GKE requests handling scalability may be an issue. If we manage a relatively higher number of buses, should we need to implement a queue system between the client and the server? Or GKE load balancer & ambassador will already manages it for us? In case we need a queue system in the future, could it be used also for the collision problem we are facing now?
Last, the reservation requests from the client often takes a while. Therefore, we are changing the requests to be handled asynchronously with a long polling approach from the client to know the task status. Could we link this solution to the current problem? For example, using the Redis caché or the queue system to know the task status? Or should we try to keep the requests synchronous and focus on reducing the processing time (it may be quite difficult).

Related

Design to support a fast and slow client

I have a situation where I host a high RPS highly available service that receives requests aka commands. These commands have to be sent to N downstream clients, who actually execute them. Each downstream client is separate microsevice and has different constraints like mode (sync,async), execution cadence etc.
Should a slow downstream client build the logic to receive all requests and execute them in batches as they want ? Or my service should build logic to talk to slow and fast clients by maintaining state for commands across downstream clients. Share your opinions
Not enough info to give any prescriptive advice, but I'd start with dividing the tasks into async and sync first. Those are 2 completely different workloads that, most likely, would require different implementation stacks. I'll give you an idea of what you can start with in the world of AWS...
Not knowing what you mean by async, I'd default to a message-bus setup. In that case you can use something like Amazon Kinesis or Kafka for ingestion purposes, and kicking off Lambda or EC2 instance. If the clients need to be notified of a finished job they can either long-poll an SQS queue, subscribe to an SNS topic, or use MQTT with websockets for a long-running connection.
The sync tasks are easier, since it's all about processing power. Just make sure you have your EC2 instances in an auto-scaling group behind an ALB or API Gateway to scale out, and in, appropriately.
This is a very simple answer since I don't have any details needed to be more precise, but this should give you an idea of where to get started.

Logging Web API calls MVC 4

I need to log to the database every call to my Web API.
Now of course I don't want to go to my database on every call.
So lets say I have a dictionary or a hash table object in my cache,
and every 10000 records I go to the database.
I still don't want this every 10000 user to wait for this operation.
And I can't start a different thread for long operations since the application pool
can be recycled basically on anytime.
What is the best solution for this scenario?
Thanks
I would argue that your view of durability is rather inconsistent. Your cache of 10000 objects could also be lost at any time due to an app pool recycle or server crash.
But to the original question of how to perform a large operation without causing the user to wait:
Put constraints on app pool recycling and deal with the potential data loss.
Periodically dump the cached messages to a Windows service for further processing. This is still not 100% guaranteed to preserve data, e.g. the service/server could crash.
Use a message queue (MSMQ), possibly with WCF. A message queue can persist to disk, so this can be considered reasonably reliable.
Message Queuing (MSMQ) technology enables applications running at
different times to communicate across heterogeneous networks and
systems that may be temporarily offline. Applications send messages to
queues and read messages from queues.
Message Queuing provides guaranteed message delivery, efficient
routing, security, and priority-based messaging. It can be used to
implement solutions to both asynchronous and synchronous scenarios
requiring high performance.
Taking this a step further...
Depending on your requirements and/or environment, you could probably eliminate your cache, and write all messages immediately (and rapidly) to a message queue and not worry about performance loss or a large write operation.

How to best implement a blocking/waiting actor?

I'm fairly new to Akka and writing concurrent applications and I'm wondering what's a good way to implement an actor that would wait for a redis list and once an item becomes available it will process it, or send it to a different actor to process?
Would using the blocking function BRPOPLPUSH be better, or would a scheduler that will ask the actor to poll redis every second be a better way?
Also, on a normal system, how many of these actors can I spawn concurrently without consuming all the resource the system has to offer? How does one decide how many of each Actor type should an actor system be able to handle on the system its running on?
As a rule of thumb you should never block inside receive. Each actor should rely only on CPU and never wait, sleep or block on I/O. When these conditions are met you can create even millions of actors working concurrently. Each actor is suppose to have 600-650 bytes memory footprint (see: Concurrency, Scalability & Fault-tolerance 2.0 with Akka Actors & STM).
Back to your main question. Unfortunately there is no official Redis client "compatible" with Akka philosophy, that is, completely asynchronous. What you need is a client that instead of blocking will return you a Future object of some sort and allow you to register callback when results are available. There are such clients e.g. for Perl and node.js.
However I found fyrie-redis independent project which you might find useful. If you are bound to synchronous client, the best you can do is either:
poll Redis periodically without blocking and inform some actor by sending a message to with a Redis reply or
block inside an actor and understand the consequences
See also
Redis client library recommendations for use from Scala
BRPOPLPUSH with block for long time (up to the timeout you specify), so I would favour a Scheduler instead which still blocks, but for a shorter amount of time every second or so.
Whichever way you go, because you are blocking, you should read this section of the Akka docs which describes methods for working with blocking libraries.
Do you you have control over the code that is inserting the item into redis? If so you could get that code to send your akka code a message (maybe over ActiveMQ using the akka camel support) to notify it when the item has been inserted into redis. This will be a more event driven way of working and prevent you from having to poll, or block for super long periods of time.

When a queue should be used?

Suppose we were to implement a network application, such as a chat with a central server and several clients: we assume that all communication must go through the central server, then it should pick up messages from some clients and forward them to target clients, and so on.
Regardless of the technology used (sockets, web services, etc..), it is possible to think that there are some producer threads (that generate messages) and some consumer threads (that read messages).
For example, you could use a single queue for incoming and outgoing messages, but using a single queue, you couldn't receive and send messages simultaneously, because only one thread at a time can access the queue.
Perhaps it would be more appropriate to use two queues: for example, this article explains a way in which you can manage a double queue so that producers and consumers can work almost simultaneously. This scenario may be fine if there are only a producer and a consumer, but if there are many clients:
How to make so that the central server can receive data simultaneously from multiple input streams?
How to make so that the central server can send data simultaneously to multiple output streams?
To resolve this problem, my idea is to use a double queue for each client: on the central server, each client connection may be associated with two queues, one for incoming messages from that client and one for outgoing messages addressed to that client. In this way the central server may send and receive data simultaneously on almost all the connections with the clients...
There are probably other ways to manage the queues ... What are the parameters to determine how many queues are needed and how to organize them? There are cases that do not need any queue?
To me, this idea of using a queue per client or multiple queues per client seems to miss the point. First of all, it is absolutely possible to build a queue which can be accessed simultaneously by 2 threads (one can be enqueueing an item while a different one is dequeueing another item). If you want to know how, post a specific question about that.
Second, even if we assume that only 1 thread at a time can access a single queue, and even if we assume that the server will be receiving or sending data to/from all the clients simultaneously, it still doesn't follow that you need a different queue for each client. To avoid limiting system performance, you just need to allow enough concurrency to utilize all the server's CPUs. Even with a single, system-wide queue, if dequeueing/enqueueing messages is fast enough compared to the other work the server is doing, it might not be a bottleneck. (And with an efficient implementation, simply inserting an item or removing an item from a queue should be very fast. It's a very simple operation.) For that message queue to become the bottleneck limiting performance, either you would need a LOT of CPUs, or everything else the server was doing would have to be very fast. In that case, you could work out some scheme with 2 or 4 system-wide queues, to allow 2x or 4x more concurrency.
The whole idea of using work queues in a multi-threaded system is that they 1) allow multiple consumers to all grab work from a single location, so producers can "dump" whatever work they need done at that single location without worrying about which consumer will do it, and 2) function as a load-balancing mechanism for the consumers. (Additionally, a work queue can act as a "buffer" if producers temporarily generate work too fast for the consumers.) If you have a dedicated pair of producer-consumer threads for each client, it calls into question why you need to use queues at all. Why not just do a synchronous "pass off" from dedicated producer to corresponding dedicated consumer? Or, why not use a single thread per client which acts as both producer and consumer? Using queues in the way which you are proposing doesn't seem to really gain anything.

Impact when changing instance count for asp.net web role in Azure

I'm having no luck trying to find out how channging the instance count for an ASP.Net web role affects requests currently being processed.
Heres the scenario:
An ASP.Net site is deployed with 6 instances
Via the console I reduce the instancecount to 4
Is azure smart enough to not remove instances from the pool if it is currently progressing requests or does it just kill them mid request?
I've been through the azure doco, goolge and a number of emails to MS tech support none of which were able to answer this seemingly simple question. I know about the events that get triggered by a shutdown etc but that doesnt really help in web site scenario with a live person waiting for a request to their response.
You cannot choose which instances to kill off. Primarily this is due to Windows Azure's instance allocation scheme, where your instances are split into different fault domains (meaning different areas of the data center - different rack, etc.). If you were to choose the instances to kill, this could leave you in a state where your remaining instances are in the same fault domain, which would void the SLA.
Having said that: You get an event when your role instance is shutting down (the OnStop() event). If you capture this event, you can do instance cleanup in preparation for VM shutdown. I can't recall if you're taken out of the load balancer at this point, but you could always force yourself out with a simple PowerShell command (Set-RoleInstanceStatus -Busy). This way your asp.net instance stops taking requests, and you can more easily shut down in a graceful manner.
EDIT: Sorry - didn't quite address all of your question. Since you get to capture OnStop(), you might have to implement a mechanism to make sure nothing's being processed in that instance. Since you're out of the load balancer, and assuming your requests are processed fairly quickly (2-5 seconds), you shouldn't have to wait long to clear out remaining requests. There's probably a performance counter to check, to see how many active requests are being handled.
Just to add to David's answer: the OnStop event happens when you are off the load balancer. For web apps, it is usually sufficient time to bleed out all requests after you are disconnected from the LB until the instance is shutdown. However, for long running or stateful connections (perhaps to a worker role), there would be an abrupt disconnect in some cases. While the OnStop method removes you from the LB, it does not terminate open connections. It simply prevents you from getting new connections. For web apps, this is usually enough time to complete the request (and you can delay the shutdown if necessary in the OnStop as well if you really want to).

Resources