I have a firestore app that's receiving webhooks from Stripe and writing the data from the webhook to a document.
Sometimes, multiple webhooks for the same Stripe object are sent less than a second apart. For example, two Charge objects with different payment statuses are sent immediately after one another. Firestore doesn't always write these in the correct order, so in some cases the status in my database is the status from the earlier webhook and not the latest.
How can I guarantee that firestore writes the documents in the correct order? There's a timestamp on the stripe webhook so it's possible to tell which is more recent.
I was going to write a transaction that only writes the data if the date on the webhook is greater than the date of the document currently in Firestore. Is there a better way to do this, though, or do I need to wrap every one of my webhook handlers in a transaction to ensure Firestore processes them in the correct order?
Here's the basic form of the stripe webhook:
{
"id": "evt_1MMIvs2fEBYORq3P6PcXHdkj",
"object": "event",
"api_version": "2022-08-01",
"created": 1672784768,
"data": {
"id": "ch_3MMMiLK6jt4dQclj1RAH3dk9",
"status": "succeeded",
...
}
}
do I need to wrap every one of my webhook handlers in a transaction to ensure Firestore processes them in the correct order?
Transactions don't guarantee a specific ordering. They just guarantee that each write will occur atomically and that they operate on a consistent view of all documents participating in the transaction - none of the documents in the transaction will change outside of the transaction as long as all changes are being done with other transactions.
In fact, it's impossible to guarantee the ordering that you're looking for since the webhooks and their database queries can appear to start or complete in any order on your backend.
The easiest thing to do is not update a single document with each webhook invocation. Instead, write an entirely new document for each status update, creating a log of what happened. If you want to get the most recent charge status at any give moment, query for the document with the latest timestamp that Stripe provides (or some other indicator of order that they give you).
With a log like this, you will also be able to look back and know when and how everything that happened with each charge, which has diagnostic value perhaps beyond what you're looking for right now, and might be helpful for customer service reasons.
If you must update a single document, then:
Start a transaction on the document by getting it
Check the Stripe timestamp on it
If the document timestamp is less than the one in the latest webhook, update the document with the latest data.
Otherwise ignore the update and terminate the transaction early.
Related
I'm looking to implement support for the outbox pattern in Cosmos DB.
However, Cosmos DB doesn't seem to support transactions across collections.
Then how do I do it?
I've been considering a few approaches to implement this:
Use Service bus transactions
Within a Service bus transaction scope, send the message (not committed just yet), do the Cosmos DB update and, if it works, then we commit the service bus transaction to have the message made available to subscribers.
Use triggers to insert rows in the outbox collection
As inserts/updates happen, we use Cosmos DB triggers to insert the respective messages into the outbox table and from then on, it's business as usual.
Use triggers to execute azure functions
Create Azure functions as Cosmos DB triggers. I almost like this but it would be so much better to get a message straight to service bus.
Use a data pump
Add two fields UpdateTimestamp and OutboxMessageTimestamp. When a recorded is updated so does the UpdateTimestamp.
Some process looks for records in which these two don't match and for each of those creates a notification message and relays it to the respective queues or topics.
Of course, then it updates the second timestamp so they match.
Other ideas on how to do this?
in general, you store things in your cosmos db collection. then you have change feed sending these changes to some observer (lets say azure function). then your azure function can do whatever: put it in queue for other consumers, save into another collection projected differently, etc... within your azure function you should implement your dead letter queue for failures that are not related to function runtime (for example, writing to another collection failed due to id conflict)
[UPDATE]
Let me add a bit more as a response to your comment.
From my experience, doing things atomically in distributed systems boils down to:
Always do things in same order
Make second step itempotent (ensuring you can repeat it any number of times getting same result)
Once first step succeeded - repeat second step until successful
So, in case you want to send email upon something saved into cosmos db, you could:
Save record in cosmos db
Have azure function listen to change feed
Once you receive inserted document > send email (more robust solution would actually put it in queue from which some dedicated consumer sends emails)
Alternative would be to have initial command (to save record) put in queue and then have 2 consumers (one for saving and one for sending emails) but then you have a problem of ordering (if thats important for you).
I am using react-native-firebase package in a react native application and am trying to understand how transactions work in offline. I am trying to write a transaction using the following code
firebase.database().ref('locations').transaction(locations => {
... my location modification logic,
return locations
})
However, if I go offline before writing the transaction and have not accessed the reference previously and therefore have no cached data, locations is null.
There is this small tidbit in Firebase's official documentation
Note: Because your update function is called multiple times, it must
be able to handle null data. Even if there is existing data in your
remote database, it may not be locally cached when the transaction
function is run, resulting in null for the initial value.
Which leads me to believe I should wrap the entire transaction logic inside
if (locations) {
... my location modification logic
}
But I still don't fully understand this. Is the following assumption correct?
Submit transaction
If offline and cached data exists, apply transaction against cached data, then apply towards current data in remote when connectivity resumes
If offline and no cached data exists, do not apply transaction. Once connectivity resumes, apply transaction to current data in remote
If online, immediately apply transaction
If these assumptions are correct, then the user will not immediately see their change in case #3, but in case #2 it will 'optimistically' update their cached data and the user will feel like their action immediately took place. Is this how offline transactions work? What am I missing?
Firebase Realtime Database (and Firestore) don't support offline transactions at all. This is because a transaction must absolutely round trip with the server at least once in order to safely commit the changes to the data, while also avoiding collisions with other clients that could be trying to change the same data.
If you're wondering why the SDK doesn't just persist the callback that handles the transaction, all that can be said is that persisting an instance of an object (and all of its dependent state, such as the values of all variables in scope) is actually very difficult, and is not even possible in all environments. So, you can expect that transaction only work while the client app is online and able to communicate with the server.
If a subscription is rerun with the "same arguments" in a flush cycle it reuses the observer or the server and the data in minimongo:
If the subscription is run with the same arguments then the “new” subscription discovers the old “marked for destruction” subscription that’s sitting around, with the same data already ready, and simply reuses that. - Meteor Guide
Additionally, if two subscriptions both request the same document Merge Box will ensure the data is not sent multiple times across DDP.
Furthermore, if a subscription is marked for destruction and rerun with different arguments the observer cannot be reused, however my question is: if there are shared documents being published by the old and new subscription, in the same flush cycle, will the overlapping documents need be intelligently recycled on the client or will they need be sent over the wire a second time.
[Assume there are no other subscriptions that share this data.]
I believe the data will be reused I need to double check though.
I am building a system that processes orders. Each order will follow a workflow. So this order can be, e.g., booked,accepted,payment approved,cancelled and so on.
Every time a status of a order changes I will post this change to SNS. To know if a status order has changed I will need to make a request to a external API, and compare to the last known status.
The question is: What is the best place to store the last known order status?
1. A SQS queue. So every time I read a message from queue, check status using the external API, delete the message and insert another one with the new status.
2. Use a database (like Dynamo DB) to control the order status.
You should not use the word "store" to describe something happening with stateful facts and a queue. Stateful, factual information should be stored -- persisted -- to a database.
The queue messages should be treated as "hints" on what work needs to be done -- a request to consider the reasonableness of a proposed action, and if reasonable, perform the action.
What I mean by this, is that when a queue consumer sees a message to create an order, it should check the database and create the order if not already present. Update an order? Check the database to see whether the order is in a correct status for the update to occur. (Canceling an order that has already shipped would be an example of a mismatched state).
Queues, by design, can't be as precise and atomic in their operation as a database should. The Two Generals Problem is one of several scenarios that becomes an issue in dealing with queues (and indeed with designing a queue system) -- messages can be lost or delivered more than once.
What happens in a "queue is authoritative" scenario when a message is delivered (received from the queue) more than once? What happens if a message is lost? There's nothing wrong with using a queue, but I respectfully suggest that in this scenario the queue should not be treated as authoritative.
I will go with the database option instead of SQS:
1) option SQS:
You will have one application which will change the status
Add the status value into SQS
Now another application will check your messages and send notification, delete the message
2) Option DynamoDB:
Insert you updated status in DynamoDB
Configure a Lambda function on update of that field
Lambda function will send notifcation
The database option looks clear additionally, you don't have to worry about maintaining any queue plus you can read one message from the queue at a time unless you implement parallel reader to read from the queue. In a database, you can update multiple rows and it will trigger the lambda and you don't have to worry about it.
Hope that helps
I'd like to understand how Firebase and listening Clients behave in the situation where a large number of updates are made to an entity in a short amount of time, and a client is listening to 'value' changes on that entity.
Say I have an entity in firebase with some simple data.
{
"entity": 1
}
And the value of that "entity" was updated very rapidly. Something like the below code that writes 1000 integers.
//pseudo-code for making 1000 writes as quickly as possible
for(var i = 0; i < 1000; i++) {
ref.child('entity').set(i)
}
Ignoring transient issues, would a listening client using the 'on' API in a browser receive ALL 1000 notifications containing 0-999, or does Firebase have throttles in place?
First off, it's important to note that the Firebase realtime database is a state synchronization service, and is not a pub/sub service.
If you have a location that is updating rapidly, the service guarantees that eventually the state will be consistent across all clients, but not that all intermittent states will be surfaced. At most one event will fire for every update, but the server is free to 'squash' successive updates to the same location into one.
On the client making the updates, I think the current behavior is that every change propagates a local event, but I could be wrong and this is a notable exception.
In order to achieve guaranteed delivery of every intermediate state, it's possible to push (childByAutoId in Objective-C) onto a list of events at a database location instead of simply updating the value directly. Check out for the Firebase REST API docs on saving lists of data