firebase database equivalent of MySQL transaction - firebase

I'm seeking something where I can thread through multiple updates to multiple firebase.database.References (before performing a commit) a single object and then commit that at the end and if it is unsuccessful no changes are made to any of my Firebase References.
Does this exist? the firebase.database.Transaction I thought would be similar since it is an atomic update and it does involve a callback which says if it has been committed or not, but the update function, I believe, is only for a single object, and the function doesn't seem to return a transactionId or something I could pass to other firebase.database.Transactionss or something.
UPDATE
This transaction's update seems to return a Transaction which would lend itself to perhaps chaining: https://firebase.google.com/docs/reference/js/firebase.firestore.Transaction
however this is different from the other Transaction:

Firebase Database transactions perform an update to a single location based on the current value of that same location. They explicitly do not work across multiple locations, since that would limit their scalability. Sometimes developers work around this by performing a transaction higher up in their JSON tree (at the first common point of the locations). I'd recommend against that, as that would limit the scalability even further.
The only way to efficiently update multiple locations with one API call, is with a multiple location update. This does however not have reading of the current value built-in.
So if you want to update multiple locations based on their current value, you'll have to perform the read operation in your application code, turn that into a multi-location update, and then use security rules to ensure all of those updates follow your application rules. This is a quite non-trivial approach, so I hardly see it being done in practice. See my answer here for an example: Is the way the Firebase database quickstart handles counts secure?

Related

Can I use Google CloudFunctions for reliable application purposes?

I remember to have read an article where it was explained that Cloud Functions are not guaranteed to be executed and especially in the right order. I can't find any sources on this anymore.
Is this still recent information?
I am aware that the start of a function can take a couple seconds, especially when cold starting the function.
Could I reliably increment a number each time a document is created in a specific Firestore collection without getting my numbers mixed up? I know this is done often but I've never seen information on whether or not it is safe to do.
Following up on question one, are there red flags when using Cloud Functions for payment backend services?
Can I be sure that Cloud Functions are executed in the order that they were triggered i.e. are they queued or executed in parallel?
Could I reliably increment a number each time a document is created in a specific Firestore collection without getting my numbers mixed up?
You can certainly write code to do that. You will need to keep track of a running count of documents in another document, and use a transaction to keep it up to date.
I don't recommend doing this. It's kind an anti-pattern in Firestore to impose sequentially increasing numbers for documents in a collection. If you want time-based ordering, you should consider using a timestamp instead.
Can I be sure that Cloud Functions are executed in the order that they were triggered i.e. are they queued or executed in parallel?
Cloud Functions provides absolutely no guarantee that functions invocations will happen in any order. They are asynchronous and can execute in parallel on multiple server instances, depending on the load applied to the function.
I strongly suggest reading through the documentation to understand the execution environment provided by Cloud Functions.

Downside of using transactions in google firestore

I'm developing a Flutter App and I'm using the Firebase services. I'd like to stick only to using transactions as I prefer consistency over simplicity.
await Firestore.instance.collection('user').document(id).updateData({'name': 'new name'});
await Firestore.instance.runTransaction((transaction) async {
transaction.update(Firestore.instance.collection('user').document(id), {'name': 'new name'});
});
Are there any (major) downsides to transactions? For example, are they more expensive (Firebase billing, not computationally)? After all there might be changes to the data on the Firestore database which will result in up to 5 retries.
For reference: https://firebase.google.com/docs/firestore/manage-data/transactions
"You can also make atomic changes to data using transactions. While
this is a bit heavy-handed for incrementing a vote total, it is the
right approach for more complex changes."
https://codelabs.developers.google.com/codelabs/flutter-firebase/#10
With the specific code samples you're showing, there is little advantage to using a transaction. If your document update makes a static change to a document, without regard to its existing data, a transaction doesn't make sense. The transaction you're proposing is actually just a slower version of the update, since it has to round-trip with the server twice in order to make the change. A plain update just uses a single round trip.
For example, if you want to append data to a string, two clients might overwrite each other's changes, depending on when they each read the document. Using a transaction, you can be sure that each append is going to take effect, no matter when the append was executed, since the transaction will be retried with updated data in the face of concurrency.
Typically, you should strive to get your work done without transactions if possible. For example, prefer to use FieldValue.increment() outside of a transaction instead of manually incrementing within a transaction.
Transactions are intended to be used when you have changes to make to a document (or, typically, multiple documents) that must take the current values of its fields into account before making the final write. This prevents two clients from clobbering each others' changes when they should actually work in tandem.
Please read more about transactions in the documentation to better understand how they work. It is not quite like SQL transactions.
Are there any (major) downsides to transactions?
I don't know any downsides.
For example, are they more expensive (Firebase billing, not computationally)?
No, a transaction costs like any other write operaton. For example, if you create a transaction to increase a counter, you'll be charged with only one write operation.
I'm not sure I understand your last question completely but if a transaction fails, Cloud Firestore retries the transaction for sure.

Does moving from None to Consistent Indexing mode in Cosmos Db re-indexes everything?

The azure cosmos db documentation mentions following in the index transformation section :
When you move to None indexing mode, the index is dropped immediately. Moving to None is useful when you want to cancel an in-progress transformation and start fresh with a different indexing policy.
I have a partitioned collection with custom indexing policy with indexing mode consistent. Since there is no transformation progress available for a partitioned collection, so before starting any new indexing policy update, I was thinking of using None mode to cancel any in progress transformation and start the new one(consistent mode). But will this cause the entire index to get recreated, even if, just a new path was added?
If answer is yes, what is the best way of checking if no transformation is in progress before starting a new one?
Welcome to StackOverflow, Jatin! I am from the CosmosDB engineering team.
We do have transformation progress now for a partitioned collections - this was released recently. This can be fetched by setting the header 'x-ms-documentdb-populatequotainfo' to True. This header, in addition to providing quota and usage information for a partitioned collection, is also used to piggyback the index transformation progress information for partitioned collections.
However, you do not need to wait for the progress to complete before changing your indexing policy. Even though this would mean that it could take longer for our background re-indexing to finish all previous index updates and catch up to your latest indexing policy, there is an opportunity for the transformations to be online, meaning that queries could continue to work based on the changes between the indexing policy updates. For example, if the only change was adding paths throughout the transformations, queries on paths that are already indexed will continue to work throughout the reindexing process, assuming that the indexing mode was consistent through all the updates.
However, if you want your new index to be effective potentially sooner, then your suggestion of transitioning to None and then to Consistent could work. Please note that this is equivalent to an offline index rebuild, meaning the existing index will be deleted and the new one will be constructed from scratch. Queries will not return consistent results on any path up until the reindexing process completes.
Welcome to Stack Overflow!
Simply change the indexing policy from Custom to Consistent. Skip going to None and go directly to Consistent indexing policy:
All index transformations are performed online. The items indexed per the old policy are efficiently transformed per the new policy without affecting the write availability or the throughput  provisioned to the container. The consistency of read and write operations that are performed by using the REST API, SDKs, or from stored procedures and triggers is not affected during index transformation.
Index transformations
While reindexing is in progress, your query may not return all matching results, if the queries happen to use the index that is being modified, and queries will not return any errors/failures. While reindexing is in progress, queries are eventually consistent regardless of the indexing mode configuration.
If you feel that None is safe step before going to Consistent, you are free to do so but it is not absolutely necessary.

Deleting very large collections in Firestore

I need to delete very large collections in Firestore.
Initially I used client side batch deletes, but when the documentation changed and started to discouraged that with the comments
Deleting collections from an iOS client is not recommended.
Deleting collections from a Web client is not recommended.
Deleting collections from an Android client is not recommended.
https://firebase.google.com/docs/firestore/manage-data/delete-data?authuser=0
I switched to a cloud function as recommended in the docs. The cloud function gets triggered when a document is deleted and then deletes all documents in a subcollection as proposed in the above link in the section on "NODE.JS".
The problem that I am running into now is that the cloud function seems to be able to manage around 300 deletes per seconds. With the maximum runtime of a cloud function of 9 minutes I can manage up to 162000 deletes this way. But the collection I want to delete currently holds 237560 documents, which makes the cloud function timeout about half way.
I cannot trigger the cloud function again with an onDelete trigger on the parent document, as this one has already been deleted (which triggered the initial call of the function).
So my question is: What is the recommended way to delete large collections in Firestore? According to the docs it's not client side but server side, but the recommended solution does not scale for large collections.
Thanks!
When you have too muck work that can be performed in a single Cloud Function execution, you will need to either find a way to shard that work across multiple invocations, or continue the work in a subsequent invocations after the first. This is not trivial, and you have to put some thought and work into constructing the best solution for your particular situation.
For a sharding solution, you will have to figure out how to split up the document deletes ahead of time, and have your master function kick off subordinate functions (probably via pubsub), passing it the arguments to use to figure out which shard to delete. For example, you might kick off a function whose sole purpose is to delete documents that begin with 'a'. And another with 'b', etc by querying for them, then deleting them.
For a continuation solution, you might just start deleting documents from the beginning, go for as long as you can before timing out, remember where you left off, then kick off a subordinate function to pick up where the prior stopped.
You should be able to use one of these strategies to limit the amount of work done per functions, but the implementation details are entirely up to you to work out.
If, for some reason, neither of these strategies are viable, you will have to manage your own server (perhaps via App Engine), and message (via pubsub) it to perform a single unit of long-running work in response to a Cloud Function.

Aggregation on FireStore/CloudDatastore. Use Cloud Functions onCreate/Update?

I want to create an expense tracker and one of the things I want to find out is how much did I spend in each month per category.
How should I do this in FireStore/DataStore?
Pull down required data and do aggregation locally? Seems very slow?
Perform aggregation everytime a transaction is created/updated and save it in a table? But this may result in many invocations of the functions, which may be costly?
Is there a better way? Seems like 2 is currently the best option? But I wonder if theres anyway I can reduce costs?
I note that I may not need the aggregated data to be realtime, so is there a way to debounce the cloud function execution? Since I note that at times, I will batch insert a bunch of transactions. Wonder if theres a way to disable functions for certain queries and manually call them after the batch has finished for example?
The two approaches you describe are indeed the most common.
The best approach mostly depends on the number of transactions you have. If you have few transactions, then it may be totally fine to do the aggregation on each client. But as you get more transactions, the overhead of downloading the data will become prohibitive and you're more likely to want to keep a running total in the database.
I'd normally recommend keeping the total up to date with any transaction. You can even do that with client-side code, by using transactions (to prevent multiple users overwriting each other's updates) and server-side security rules (to prevent malicious actors from writing an aggregate that doesn't match its transaction).
If you want to aggregate in batches, you'll want to run code periodically, either in a server you control, or in Cloud Functions.
There is nothing built into Cloud Functions to debounce document writes. You could probably keep a debounce counter in Firestore, but that would then be reading/writing a document on each transaction.
More reasonable seems to run a function on a timer, as described in this blog post and shown in this video. But you'll need to make sure your data structure in that case allows the code to detect what transactions it needs to aggregate.
One way to do this is to ensure the transactions can be ordered in some way, e.g. by giving them a timestamp, and having your aggregation code keep track (likely in the database) of the last timestamp it has aggregated already. Then whenever the aggregator runs, it:
reads the current aggregated value
queries the database for transactions that have been added since it last ran
loops over those transactions, updating the aggregated value
writes the aggregated value and the last timestamp back to the database in a transaction (to ensure either both are written, or neither is written)

Resources