How to avoid Firestore document write limit when implementing an Aggregate Query? - firebase

I need to keep track of the number of photos I have in a Photos collection. So I want to implement an Aggregate Query as detailed in the linked article.
My plan is to have a Cloud Function that runs whenever a Photo document is created or deleted, and then increment or decrement the aggregate counter as needed.
This will work, but I worry about running into the 1 write/document/second limit. Say that a user adds 10 images in a single import action. That is 10 executions of the Cloud Function in more-or-less the same time, and thus 10 writes to the Aggregate Query document more-or-less at the same time.
Looking around I have seen several mentions (like here) that the 1 write/doc/sec limit is for sustained periods of constant load, not short bursts. That sounds reassuring, but it isn't really reassuring enough to convince an employer that your choice of DB is a safe and secure option if all you have to go on is that 'some guy said it was OK on Google Groups'. Is there any official sources stating that short write bursts are OK, and if so, what definitions are there for a 'short burst'?
Or are there other ways to maintain an Aggregate Query result document without also subjecting all the aggregated documents to a very restrictive 1 write / second limitation across all the aggregated documents?

If you think that you'll see a sustained write rate of more than once per second, consider dividing the aggregation up in shards. In this scenario you have N aggregation docs, and each client/function picks one at random to write to. Then when a client needs the aggregate, it reads all these subdocuments and adds them up client-side. This approach is quite well explained in the Firebase documentation on distributed counters, and is also the approach used in the distributed counter Firebase Extension.

Related

Has firestore removed the soft limit of 1 write per second to a single document?

Firestore has always had a soft limit of 1 write per second to a single document. That meant that for doing things like a counter that updates more frequently than once per second, the recommended solution was sharded counters.
Looking at the Firestore Limits documentation, this limit appears to have disappeared. The Firebase summit presentation mentioned that Firestore is now more scalable, but only mentioned hard limits being removed.
Can anyone confirm whether this limit has indeed been removed, and we can remove all our sharded counters in favor of writing to a single count document tens or hundreds of times per second?
firebaser here
This was indeed removed from the documentation. It was always a soft limit that was not enforced in the code, but instead an estimate for the physical limitation of how long it takes to synchronize the changes to indexes and multiple data center.
We've significantly improved the infrastructure used for these write operations, and now provide tools such a the key visualizer to better analyze performance and look for hot spots in the read and write behavior of your app. While there's still some physical limit, we recommend using these tools rather than depending on a single documented value to analyze your app's database performance.
For most use-cases I'd recommend using the new COUNT() operator nowadays. But if you want to continue using write-time aggregation counters, it is still recommended to use a sharded counter for high-volume count operations, we've just stopped giving a hard number for when to use it.

Strength of atomic update in Firestore increment

I'm Firestore user recently diving into a concept of "atomic" update, especially Firestore documents' increment update. There is a classic article on Firestore increment in context of atomic update. And here comes my question.
Q, How strong is this atomic increment(number) update? Does this operation really have no limitation when it comes to operating truly atomically?
Let me explain a bit of details with an example case. We know that Firestore has a write limitation of 10,000 (up to 10 MiB per second) per db instance, and we also know that Firestore's increment method updates documents atomically. So, I hope to know if the below extreme example case would work perfectly atomically.
This Firestore instance only has a single document, and numerous users-maybe 10000 users maximum- update a single document using increment method, which increments a same field value as much as a random double number between 0 and 1 each, WITHIN a single second: 10000 updates in 1 second;
Above case makes use of Firestore write rate limit per second as much as possible, and all operations are updating a single field of same document. If increment method deals with update requests truly atomically, we might say all 10000 details will be calculated correctly into a single field.
But, this is only theoretic and conceptual idea, and it seems really hard for Firestore(or even any other db systems) to make no exception when it performs such an extreme set of increment operations when it has to deal with other upcoming operations linearly. It means that the Firestore instance would keep going on with upcoming API requests. This is a real world problem, actually. Let's say a lovely singer, Ariana Grande's Instagram post is just uploaded. If we deal with the event with Firestore document, we would have to deal with thousands of increment requests for likes per a single second.
So, i hope to know if there is truly no limitations for atomic increment method even there comes a set of high number of extremely concurrent increment requests to very few number of target documents. Hope this question reach to firebase gurus in the community! Comments are really welcomed! Thanks in advance [:
I'm not sure I understand your question completely, but I'll try to help anyway by explaining how Firestore and its increment operation work.
Firestore's main write limits come from the fact that data needs to be synchronized between data centers for each write operation. This is not a quota-type limit, but a physical limit of how fast data can be pushed across the wires.
Since you talk about frequent writes to a single document, you're going to sooner hit the soft limit of 1 sustained write per second per document. This is also caused by the physical nature of how the database works, and needs to synchronize the documents and indexes between servers/data centers.
While using the increment() operation means that no roundtrip is needed between the client and the server, it makes no difference to the data that needs to be read/written on the servers themselves. Therefore it makes no difference to the documented throughput limits.
If you need to perform counts beyond the documented throughput limits, have a look at the documentation on using a distributed counter.

Firestore count number of documents

If I have a Firestore document in the following structure:
In my web app, I would like to display the number of followers. If I just do a get() of the whole followers sub-collection. That will be costly in terms of read operations. I thought about the following solution:
Having a counter document and having a counter field that would be incremented every time a document is created inside the followers collection using cloud function. But there is the limit of one write per second per document for that counter. The idea to have a followers collection and each document for each follower is to avoid the one write per second limit (thanks to Doug Stevenson's blog: The top 10 things to know about Firestore when choosing a database for your app).
The only get around for that I can think of is to use distributed counter extension. But from I read so far, the counter only works with front-end SDK. Would I be able to use the extension in a cloud function or in a node.js backend to increase the followers counter?
The "one write per document per second" is a guideline and not a hard rule, so I'd highly recommend not immediately getting hung up on that.
Then again, if you think you'll consistently need to count more than can be kept in a single document, your options are:
Keep a distributed counter, as shown in the documentation on distributed counters.
Keep the counter somewhere else. For example, I typically keep counters in Realtime Database, which has much higher write throughput (but lower read concurrency per shard).
But from I read so far, the counter only works with front-end SDK.
That's not true. The extension works for any query made to Firestore.
Would I be able to use the extension in a cloud function or in a node.js backend to increase the followers counter?
The extension works by monitoring documents added to and removed from the collection. It doesn't matter where the change comes from. You will still be able to use the computed counter from any code that's capable of querying the counter documents.

Firestore Document "Too much contention": such thing in realtime database?

I've built an app that let people sell tickets for events. Whenever a ticket is sold, I update the document that represents the ticket of the event in firestore to update the stats.
On peak times, this document is updated quite a lot (10x a second maybe). Sometimes transactions to this item document fail due to the fact that there is "too much contention", which results in inaccurate stats since the stat update is dropped. I guess this is the result of the high load on the document.
To resolve this problem, I am considering to move the stats of the items from the item document in firestore to the realtime database. Before I do, I want to be sure that this will actually resolve the problem I had with the contention on my item document. Can the realtime database handle such load better than a firestore document? Is it considered good practice to move such data to the realtime database?
The issue you're running into is a documented limit of Firestore. There is a limit to the rate of sustained writes to a single document of 1 per second. You might be able to burst writes faster than that for a while, but eventually the writes will fail, as you're seeing.
Realtime Database has different documented limits. It's measured in the total volume of data written to the entire database. That limit is 64MB per minute. If you want to move to Realtime Database, as long as you are under that limit, you should be OK.
If you are effectively implementing a counter or some other data aggregation in Firestore, you should also look into the distributed counter solution that works around the per-document write limit by sharding data across multiple documents. Your client code would then have to use all of these document shards in order to present data.
As for whether or not any one of these is a "good practice", that's a matter of opinion, which is off topic for Stack Overflow. Do whatever works for your use case. I've heard of people successfully using either one.
On peak times, this document is updated quite a lot (10x a second maybe). Sometimes transactions to this item document fail due to the fact that there is "too much contention"
This is happening because Firestore cannot handle such a rate. According to the official documentation regarding quotas for writes and transactions:
Maximum write rate to a document: 1 per second
Sometimes it might work for two or even three writes per second but at some time will definitely fail. 10 writes per second are way too much.
To resolve this problem, I am considering to move the stats of the items from the item document in Firestore to the realtime database.
That's a solution that I even I use it for such cases.
According to the official documentation regarding usage and limits in Firebase Realtime database, there is no such limitation there. But it's up to you to decide if it fits your needs or not.
There one more thing that you need to into consideration, which is distributed counter. It can solve your problem for sure.

What is the most cost-efficient method of making document writes/reads from Firestore?

Firebase's Cloud Firestore gives you limits on the number of document writes and reads (and deletes). For example, the spark plan (free) allows 50K reads and 20k writes a day. Estimating how many writes and reads is obviously important when developing an app, as you will want to know the potential costs incurred.
Part of this estimation is knowing exactly what counts as a document read/write. This part is somewhat unclear from searching online.
One document can contain many different fields, so if an app is designed such that user actions done through a session require the fields within a single document to be updated, would it be cost-efficient to update all the fields in one single document write at the end of the session, rather than writing the document every single the user wants to update one field?
Similarly, would it not make sense to read the document once at the start of a session, getting the values of all fields, rather than reading them when each is needed?
I appreciate that method will lead to the user seeing slightly out-of-date field values, and the database not being updated admittedly, but if such things aren't too much of a concern to you, couldn't such a method reduce you reads/writes by a large factor?
This all depends on what counts as a document write/read (does writing 20 fields within the same document in one go count as 20 writes?).
The cost of a write operation has no bearing on the number of fields you write. It's purely based on the number of times you call update() or set() on a document reference, weither independently, in a transaction, or in a batch.
If you choose to write each N fields using N separate updates, then you will be charged N writes. If you choose to write N fields using 1 update, then you will be charged 1 write.

Resources