Using SQS or DynamoDB to control order status - amazon-dynamodb

I am building a system that processes orders. Each order will follow a workflow. So this order can be, e.g., booked,accepted,payment approved,cancelled and so on.
Every time a status of a order changes I will post this change to SNS. To know if a status order has changed I will need to make a request to a external API, and compare to the last known status.
The question is: What is the best place to store the last known order status?
1. A SQS queue. So every time I read a message from queue, check status using the external API, delete the message and insert another one with the new status.
2. Use a database (like Dynamo DB) to control the order status.

You should not use the word "store" to describe something happening with stateful facts and a queue. Stateful, factual information should be stored -- persisted -- to a database.
The queue messages should be treated as "hints" on what work needs to be done -- a request to consider the reasonableness of a proposed action, and if reasonable, perform the action.
What I mean by this, is that when a queue consumer sees a message to create an order, it should check the database and create the order if not already present. Update an order? Check the database to see whether the order is in a correct status for the update to occur. (Canceling an order that has already shipped would be an example of a mismatched state).
Queues, by design, can't be as precise and atomic in their operation as a database should. The Two Generals Problem is one of several scenarios that becomes an issue in dealing with queues (and indeed with designing a queue system) -- messages can be lost or delivered more than once.
What happens in a "queue is authoritative" scenario when a message is delivered (received from the queue) more than once? What happens if a message is lost? There's nothing wrong with using a queue, but I respectfully suggest that in this scenario the queue should not be treated as authoritative.

I will go with the database option instead of SQS:
1) option SQS:
You will have one application which will change the status
Add the status value into SQS
Now another application will check your messages and send notification, delete the message
2) Option DynamoDB:
Insert you updated status in DynamoDB
Configure a Lambda function on update of that field
Lambda function will send notifcation
The database option looks clear additionally, you don't have to worry about maintaining any queue plus you can read one message from the queue at a time unless you implement parallel reader to read from the queue. In a database, you can update multiple rows and it will trigger the lambda and you don't have to worry about it.
Hope that helps

Related

DDD: persisting domain objects into two databases. How many repositories should I use?

I need to persist my domain objects into two different databases. This use case is purely write-only. I don't need to read back from the databases.
Following Domain Driven Design, I typically create a repository for each aggregate root.
I see two alternatives. I can create one single repository for my AG, and implement it so that it persists the domain object into the two databases.
The second alternative is to create two repositories, one each for each database.
From a domain driven design perspective, which alternative is correct?
My requirement is that it must persist the AR in both databases - all or nothing. So if the first one goes through and the second fails, I would need to remove the AG from the first one.
If you had a transaction manager that were to span across those two databases, you would use that manager to automatically roll back all of the transactions if one of them fails. A transaction manager like that would necessarily add overhead to your writes, as it would have to ensure that all transactions succeeded, and while doing so, maintain a lock on the tables being written to.
If you consider what the transaction manager is doing, it is effectively writing to one database and ensuring that write is successful, writing to the next, and then committing those transactions. You could implement the same type of process using a two-phase commit process. Unfortunately, this can be complicated because the process of keeping two databases in sync is inherently complex.
You would use a process manager or saga to manage the process of ensuring that the databases are consistent:
Write to the first database and leave the record in a PENDING status (not visible to user reads).
Make a request to second database to write the record in a PENDING status.
Make a request to the first database to leave the record in a VALID status (visible to user reads).
Make a request to the second database to leave the record in a VALID status.
The issue with this approach is that the process can fail at any point. In this case, you would need to account for those failures. For example,
You can have a process that comes through and finds records in PENDING status that are older than X minutes and continues pushing them through the workflow.
You can can have a process that cleans up any PENDING records after X minutes and purges them from the database.
Ideally, you are using something like a queue based workflow that allows you to fire and forget these commands and a saga or process manager to identify and react to failures.
The second alternative is to create two repositories, one each for each database.
Based on the above, hopefully you can understand why this is the correct option.
If you don't need to write why don't build some sort of commands log?
The log acts as a queue, you write the operation in it, and two processes pulls new command from it and each one update a database, if you can accept that in worst case scenario the two dbs can have different version of the data, with the guarantees that eventually they will be consistent it seems to me much easier than does transactions spanning two different dbs.
I'm not sure how much DDD is your use case, as if you don't need to read back you don't have any state to manage, so no need for entities/aggregates

Google PubSub queue for distributed state management

I have n sources that a job depends on.
Each source has a separate topic in Google PubSub; when a source is updated it sends a message in the corresponding topic subscription. When all sources are updated (i.e. when there is at least one new message in each subscription) the job can start.
The job is scheduled with airflow. The DAG starts with a series of parallel tasks one for each subscription that check if a new message has been published, but without aknowledging it. The next task waits for all the previous ones and uses XCOM to see if all contains a message. In that case it proceeds with the job (it first aknowledge the messages), otherwise it stops.
In this way I acknowledge the messages only when they are all available, using PubSub as a coordinator. The messages frequency is once or twice a day at most.
Basically I'm using PubSub as way to keep "state". Suppose I have different jobs that depend on the same source. I can create a subscription for the same topic on each job and it all works fine.
Is there a better way/tool/framework to do this?
According with the volume of message that you have, and from my previous implementations, I can recommend you to persist states in Firestore: serverless, affordable, fast...
When a message is published, trigger a function that persist state in Firestore
Then, trigger the number of processes that you want, query Firestone to check if all the states are OK, and continue or stop.
It's my pattern for synchronization. Not that the best!
Anyway, if you create a subscription per process, it also works. The message are duplicated in each subscription and thus you can process them independently.

How to implement outbox pattern in Cosmos DB

I'm looking to implement support for the outbox pattern in Cosmos DB.
However, Cosmos DB doesn't seem to support transactions across collections.
Then how do I do it?
I've been considering a few approaches to implement this:
Use Service bus transactions
Within a Service bus transaction scope, send the message (not committed just yet), do the Cosmos DB update and, if it works, then we commit the service bus transaction to have the message made available to subscribers.
Use triggers to insert rows in the outbox collection
As inserts/updates happen, we use Cosmos DB triggers to insert the respective messages into the outbox table and from then on, it's business as usual.
Use triggers to execute azure functions
Create Azure functions as Cosmos DB triggers. I almost like this but it would be so much better to get a message straight to service bus.
Use a data pump
Add two fields UpdateTimestamp and OutboxMessageTimestamp. When a recorded is updated so does the UpdateTimestamp.
Some process looks for records in which these two don't match and for each of those creates a notification message and relays it to the respective queues or topics.
Of course, then it updates the second timestamp so they match.
Other ideas on how to do this?
in general, you store things in your cosmos db collection. then you have change feed sending these changes to some observer (lets say azure function). then your azure function can do whatever: put it in queue for other consumers, save into another collection projected differently, etc... within your azure function you should implement your dead letter queue for failures that are not related to function runtime (for example, writing to another collection failed due to id conflict)
[UPDATE]
Let me add a bit more as a response to your comment.
From my experience, doing things atomically in distributed systems boils down to:
Always do things in same order
Make second step itempotent (ensuring you can repeat it any number of times getting same result)
Once first step succeeded - repeat second step until successful
So, in case you want to send email upon something saved into cosmos db, you could:
Save record in cosmos db
Have azure function listen to change feed
Once you receive inserted document > send email (more robust solution would actually put it in queue from which some dedicated consumer sends emails)
Alternative would be to have initial command (to save record) put in queue and then have 2 consumers (one for saving and one for sending emails) but then you have a problem of ordering (if thats important for you).

How send request with AFNetworking 2 in strict sequential order?

I'm doing the sync to mirror a sqlite DB to a server one.
I have a Master-Detail table, where the details must be send to the server ASAP. However, is possible that detail 3 arrive before detail 2. I need to mimic the steps made to the document and respect the order of the operations.
When a record is saved locally, I send a notification and then post the data. How I can guarantee a strict sequential order using AFNetworking?
By default, operations run concurrently, with no guarantee of order. The only way to ensure that actions play is to prevent more than one request operation from running at a given time, by setting the operationQueue.maximumConcurrentOperations property to 1 (or, if you're not using a manager, make sure to enqueue operations into an operation queue with the property set thusly).

NServiceBus, when are too many message used?

When considering a service in NServiceBus at what point do you start questioning how many messages handled by a service is too much and start to break these into a new service?
Consider the following: I have a sales service which can currently be broken into a few distinct business components, these are sales order validation, sales order processing, purchase order validation and purchase order processing.
There are currently about 20 message handlers and 2 sagas used within this service. My concern is that during high volume traffic from my website this can cause an initial spike in the messages to jump into the hundreds. Considering that the messages need to be processed in the order they are taken off the queue this can cause a delay for the last in the queue ( depending on what processing each message does).
When separating concerns within a service into smaller business components I find this makes things a little easier. Sure, it's a logical separation, but it seems to provide a layer of clarity and understanding. To me it seems it seems an easier option to do this than creating new services where in the end the more services I have the more maintenance I need to do.
Does anyone have any similar concerns to this?
I think you have actually answered you own question :)
As soon as the message volume reaches a point where the lag becomes an issue you could look to instance your endpoint. You do not necessarily need to reduce the number of handlers. You could simply install the service a number of times and have specific message types sent to the relevant endpoint by mapping.
So it becomes a matter of a simple instance installation and some config changes. So you can then either split messages on sending so that messages from a particular source end up on a particular endpoint (maybe priority) or on message type.
I happened to do the same thing on a previous project (not using NServiecBus though) where we needed document conversion messages coming from the UI to be processed ASAP. We simply installed the conversion service again with its own set of queues and changed the UI configuration to send the conversion messages to the new endpoint. The background conversion messages were still going to the previous endpoint. So here the source determined the separation.

Resources