meaning of high rate in Firestore - firebase

I saw this docs (https://cloud.google.com/firestore/docs/best-practices#hotspots) and it says:
Avoid high read or write rates to lexicographically close documents,
or your application will experience contention errors. This issue is
known as hotspotting, and your application can experience hotspotting
if it does any of the following:
Creates new documents at a very high rate and allocates its own
monotonically increasing IDs.
Cloud Firestore allocates document IDs using a scatter algorithm. You
should not encounter hotspotting on writes if you create new documents
using automatic document IDs.
Creates new documents at a high rate in a collection with few
documents.
Creates new documents with a monotonically increasing field, like a
timestamp, at a very high rate.
Deletes documents in a collection at a high rate.
Writes to the database at a very high rate without gradually
increasing traffic.
Does a high rate occur when a lot of users create documents at once?
Or is it talking about creating documents by running a for or while(roop) statement?

Does a high rate occur when a lot of users create documents at once? Or is it talking about creating documents by running a for or while(roop) statement?
Either of those can trigger hotspotting in certain cases with a high write rate. More important than where the writes come from is how fast the write come in, how you assign document IDs, and whether or not you're writing monotonically increasing or decreasing fields.
This article goes into more detail on the timestamp case and describes a workaround:
https://cloud.google.com/firestore/docs/solutions/shard-timestamp

Related

How can I know if indexing a timestamp field on a collection of documents is going to cause problems?

I saw on the Firestore documentation that it is a bad idea to index monotonically increasing values, that it will increase latency. In my app I want to query posts based on unix time which is a double and that is a number that will increase as time moves on, but in my case not perfectly monotonically because people will not be posting every second, in addition I don't think my app will exceed 4 million users. does anyone with expertise think this will be a problem for me
It should be no problem. Just make sure to store it as number and not as String. Othervise the sorting would not work as expected.
This is exactly the problem that the Firestore documentation is warning you about. Your database code will incur a cost of "hotspotting" on the index for the timestamp at scale. Specifically, from that linked documentation:
Creates new documents with a monotonically increasing field, like a timestamp, at a very high rate.
The numbers don't have to be purely monotonic. The hotspotting happens on ranges that are used for sharding the index. The documentation just doesn't tell you what to expect for those ranges, as they can change over time as the index gains more documents.
Also from the documentation:
If you index a field that increases or decreases sequentially between documents in a collection, like a timestamp, then the maximum write rate to the collection is 500 writes per second. If you don't query based on the field with sequential values, you can exempt the field from indexing to bypass this limit.
In an IoT use case with a high write rate, for example, a collection containing documents with a timestamp field might approach the 500 writes per second limit.
If you don't have a situation where new documents are being added rapidly, it's not a near-term problem. But you should be aware that it just doesn't not scale up like reads and queries will scale against that index. Note that number of concurrent users is not the issue at all - it's the number of documents being added per second to an index shard, regardless of how many people are causing the behavior.

How to write ~500 K documents every day efficiently in Firestore?

I have a Cloud Function in Python 3.7 to write/update small documents to Firestore. Each document has an user_id as Document_id, and two fields: a timestamp and a map (a dictionary) with three key-value objects, all of them are very small.
This is the code I'm using to write/update Firestore:
doc_ref = db.collection(u'my_collection').document(user['user_id'])
date_last_seen=datetime.combine(date_last_seen, datetime.min.time())
doc_ref.set({u'map_field': map_value, u'date_last_seen': date_last_seen})
My goal is to call this function one time every day, and write/update ~500K documents. I have tried the following tests, for each one I include the execution time:
Test A: Process the output to 1000 documents. Don't write/update Firestore -> ~ 2 seconds
Test B: Process the output to 1000 documents. Write/update Firestore -> ~ 1 min 3 seconds
Test C: Process the output to 5000 documents. Don't write/update Firestore -> ~ 3 seconds
Test D: Process the output to 5000 documents. Write/update Firestore -> ~ 3 min 12 seconds
My conclusion here: writing/updating Firestore is consuming more than 99% of my compute time.
Question: How to write/update ~500 K documents every day efficiently?
It's not possible to prescribe a single course of action without knowing details about the data you're actually trying to write. I strongly suggest you read the documentation about best practices for Firestore. It will give you a sense of what things you can do to avoid problems with heavy write loads.
Basically, you will want to avoid these situations, as described in that doc:
High read, write, and delete rates to a narrow document range
Avoid high read or write rates to lexicographically close documents,
or your application will experience contention errors. This issue is
known as hotspotting, and your application can experience hotspotting
if it does any of the following:
Creates new documents at a very high rate and allocates its own monotonically increasing IDs.
Cloud Firestore allocates document IDs using a scatter algorithm. You should not encounter hotspotting on writes if you create new
documents using automatic document IDs.
Creates new documents at a high rate in a collection with few documents.
Creates new documents with a monotonically increasing field, like a timestamp, at a very high rate.
Deletes documents in a collection at a high rate.
Writes to the database at a very high rate without gradually increasing traffic.
I won't repeat all the advice in that doc. What you do need to know is this: because of the way that Firestore is built to scale massively, limits are placed on how quickly you can write data into it. The fact that you have to scale up gradually is probably going to be your main problem that can't be solved.
I achieved my needs with batched queries. But according to Firestore documentation there is another faster way:
Note: For bulk data entry, use a server client library with
parallelized individual writes. Batched writes perform better than
serialized writes but not better than parallel writes. You should use
a server client library for bulk data operations and not a mobile/web
SDK.
I also recommend to take a look to this post in stackoverflow with examples in Node.js

Firestore Document "Too much contention": such thing in realtime database?

I've built an app that let people sell tickets for events. Whenever a ticket is sold, I update the document that represents the ticket of the event in firestore to update the stats.
On peak times, this document is updated quite a lot (10x a second maybe). Sometimes transactions to this item document fail due to the fact that there is "too much contention", which results in inaccurate stats since the stat update is dropped. I guess this is the result of the high load on the document.
To resolve this problem, I am considering to move the stats of the items from the item document in firestore to the realtime database. Before I do, I want to be sure that this will actually resolve the problem I had with the contention on my item document. Can the realtime database handle such load better than a firestore document? Is it considered good practice to move such data to the realtime database?
The issue you're running into is a documented limit of Firestore. There is a limit to the rate of sustained writes to a single document of 1 per second. You might be able to burst writes faster than that for a while, but eventually the writes will fail, as you're seeing.
Realtime Database has different documented limits. It's measured in the total volume of data written to the entire database. That limit is 64MB per minute. If you want to move to Realtime Database, as long as you are under that limit, you should be OK.
If you are effectively implementing a counter or some other data aggregation in Firestore, you should also look into the distributed counter solution that works around the per-document write limit by sharding data across multiple documents. Your client code would then have to use all of these document shards in order to present data.
As for whether or not any one of these is a "good practice", that's a matter of opinion, which is off topic for Stack Overflow. Do whatever works for your use case. I've heard of people successfully using either one.
On peak times, this document is updated quite a lot (10x a second maybe). Sometimes transactions to this item document fail due to the fact that there is "too much contention"
This is happening because Firestore cannot handle such a rate. According to the official documentation regarding quotas for writes and transactions:
Maximum write rate to a document: 1 per second
Sometimes it might work for two or even three writes per second but at some time will definitely fail. 10 writes per second are way too much.
To resolve this problem, I am considering to move the stats of the items from the item document in Firestore to the realtime database.
That's a solution that I even I use it for such cases.
According to the official documentation regarding usage and limits in Firebase Realtime database, there is no such limitation there. But it's up to you to decide if it fits your needs or not.
There one more thing that you need to into consideration, which is distributed counter. It can solve your problem for sure.

What is the most cost-efficient method of making document writes/reads from Firestore?

Firebase's Cloud Firestore gives you limits on the number of document writes and reads (and deletes). For example, the spark plan (free) allows 50K reads and 20k writes a day. Estimating how many writes and reads is obviously important when developing an app, as you will want to know the potential costs incurred.
Part of this estimation is knowing exactly what counts as a document read/write. This part is somewhat unclear from searching online.
One document can contain many different fields, so if an app is designed such that user actions done through a session require the fields within a single document to be updated, would it be cost-efficient to update all the fields in one single document write at the end of the session, rather than writing the document every single the user wants to update one field?
Similarly, would it not make sense to read the document once at the start of a session, getting the values of all fields, rather than reading them when each is needed?
I appreciate that method will lead to the user seeing slightly out-of-date field values, and the database not being updated admittedly, but if such things aren't too much of a concern to you, couldn't such a method reduce you reads/writes by a large factor?
This all depends on what counts as a document write/read (does writing 20 fields within the same document in one go count as 20 writes?).
The cost of a write operation has no bearing on the number of fields you write. It's purely based on the number of times you call update() or set() on a document reference, weither independently, in a transaction, or in a batch.
If you choose to write each N fields using N separate updates, then you will be charged N writes. If you choose to write N fields using 1 update, then you will be charged 1 write.

Understanding Firestore Pricing

Before creating a new app I wanna make sure I get the pricing model correct.
For example in a phonebook app, I have a collection called userList that has a list of users which are individual documents.
I have 50k users on my list, which means I have 50k documents in my collection.
If I were to get the userList collection it will read all 50k documents.
FireStore allows 50k document reads. Does that mean 50k document reads in total or 50k document read per document?
As in the example of my phonebook app if it is 50k document reads in total I will run out of the free limit in just one get call.
If you actually have to pull an entire collection of 50k documents, the question you likely should be asking is how to properly structure a Firestore Database.
More than likely you need to filter these documents based on some criteria within them by using the query WHERE clause. Having each client device hold 50k documents locally sounds like poor database planning and possibly a security risk.
Each returned document from your query counts as 1 read. If there are no matches to your query, 1 read is charged. If there are 50k matches, there are 50k reads charged.
For example, you can retrieve the logged in user's document and be charged 1 read with something like:
db.collection('userList').where('uid', '==', clientUID)
Note: As of 10/2018 Firestore charges 6 cents (USD) per 100k reads after the first 50k/ day.
The free quota is for your entire project. So you're allowed 50.000 document reads under the entire project.
Reading 50K user profile documents will indeed use that free quota in one go.
Reading large numbers of documents is in general something you should try to prevent when using NoSQL databases.
The client apps that access Firestore should only read data that they're going to immediately show to the user. And there's no way you'll fit 50K users on a screen.
So more likely you have a case where you're aggregating over the user collection. E.g. things like:
Count the number of users
Count the number of users named Frank
Calculate the average length of the user names
NoSQL databases are usually more limited in their query capabilities than traditional relational databases, because they focus on ensuring read-scalability. You'll frequently do extra work when something is written to the database, if in exchange you can get better performance when reading from the database.
For better performance you'll want to store these aggregation values in the database, and then update them whenever a user profile is written. So you'll have a "userCount", a document with "userCount for each unique username", and a "averageUsernameLength".
For an example of how to run such aggregation queries, see: https://firebase.google.com/docs/firestore/solutions/aggregation. For lower write volumes, you can also consider using Cloud Functions to update the counters.
Don't call all users in one go. You can limit your query to get a limited number of users. And when a user will scroll your query will get more users. And as no one is going to scroll fro 50k users so you can get rid of a bundle of cost. This is something like saving memory in case of recycle view.

Resources