Cloud Firestore Data Structure - firebase

I am creating an application that uses cloud firestore to store data about "events" in our lab on several assets. We collected data for a few months and we are averaging about 2000 events per asset per month. Each event captures a few pieces of meta data that the user can query.
I imported all the data into firestore with a very simple layout at first.
Events (Collection of event data)
-> EventData (documents which contains a few fields for metadata)
From my understanding, even if the collection of events becomes quite large, for billing and speed of queries this won't be a problem (assuming I do some sort of pagination on the query results). The composite indexes are also very manageable with this structure.
The problem I see, is if someone goes and looks at the firestore console and brings that collection up, our read requests go through the roof. It seems that does a full read on the entire collection...which of course will kill us on billing as time goes on. I don't see this as a problem forever as eventually we should get to the point where everything is stable and won't need to go into the console very often, but what if someone does when we have a million or more records.
My next thought was to structure the database like this:
Events -> Assets -> {Asset_Name } -> {year_month} -> {Collection of
Document with field meta-data}
This certainly solves the issue of the ever growing collection of documents. The number of assets that we have is fixed, and the number of events is (effectively) capped to a maximum amount per month as well. The problem with this setup, however, is managing composite indexes. There are about 5 indexes needed for my original setup. I think this alternative setup means I would need to setup the same 5 indexes for each each collection of documents for every asset every month.
I thought maybe there could be a way to have a cloud function manage it for me (it doesn't appear there is an API for this). I think the number of indexes per project is also capped.
So, in the end, I am looking for recommendations on how to structure this database to limit reads if using the console, as well as keeping the indexes manageable. I am pretty new to NoSQL and perhaps I am just completely off.

I recommend you keep your structure as is if that's what's working for you. You should not need to optimize for reducing console reads. Console reads do count towards your usage but the console does not load the entire collection when you open the console.
The console loads just enough documents to let you scroll a bit and then it loads more documents if you scroll down. It will only load the entire collection if you scroll through the entire collection.

Related

Should I create a duplicate collection/document for each use-case? (Firebase/Firestore)

I'm trying to build an ecommerce app with firebase on the backend. I have a collection of 1000+ products, each of which is stored as a separate document, which have product specific info such as price, title etc.
document:{
title: 'Some Title',
price: '$99.99',
genres: ['Horror', 'Action']
}
So in my app I need to display these products in many places, such as product carousels(similar to a bookshelf with arrow buttons at the ends), and also in a search results page.
At any given page, I assume that I will need to display at least 50 products, either as search results, or multiple carousels. I understand that I can use queries to get this data from firebase. But since each document I retrieve counts as (at least)one firestore read, I assume that a typical user session would run into 100+ reads, if not thousands.
It seems a little inefficient to me that I need to read multiple documents to get this data, when I could just all that data in a single array, as its own document. That would mean I get charged for one document read, not 50, per page.
Is this how it is expected to be done? Should I create a new document containing the data I need for each specific use case?
P.S. I'm pretty new to backend dev, let alone firebase.
TL;DR Yes, you should create a new document with the needed data for each specific use case, but it’s not recommended to make it as a document with nested objects like arrays with 1000+ elements.
From a technical point of view, Cloud Firestore is optimized for storing large collections of small documents.
Depending on the use case, you can select the most appropriate Cloud Firestore data structure.
For example, the 10 most buyed books of the month can be a document with nested complex objects like arrays or maps. This structure could be useful for use cases with a small or predefined number of elements, but as stated here, if your data expands over time with larger or growing lists, the document also grows, which can lead to slower document retrieval times.
In plus thousand registers, a better choice can be structure your data as subcollections. It is, you can create collections within documents when you have data that might expand over time, with the main advantage that, as your lists grow, the size of the parent document doesn't change.
Cloud Firestore also has several features to help you manage queries that return a large number of results:
Cursors, which allow you to resume a long-running query.
Page tokens, which help you paginate the query results.
Limits, which specify how many results to retrieve.
Offsets, which allow you
to skip a fixed number of documents.
There are no additional costs for using cursors, page tokens, and limits. In fact, these features can help you save money by reading only the documents that you actually need.
As a best practice, do not use offsets. Instead, use cursors. Using an offset only avoids returning the skipped documents to your application, but these documents are still retrieved internally. The skipped documents affect the latency of the query, and your application is billed for the read operations required to retrieve them.

What constitutes a write action in Firestore?

I'm currently developing a Flutter web application using Firestore for data persistence. The app is not live in production, so I'm the only one accessing this backend. There is only one collection that holds a single document, with many nested fields (6 levels deep). My understanding from looking at https://firebase.google.com/docs/firestore/pricing, is that reads are counted per doc, so every time I reload my app it should count as one read, yet in the last 4 hours since I started working today I already hit 1.7K reads (as reported in the usage tab). I know I haven't reloaded the app that many times, and there's also no hidden loop that calls the collection multiple times.
This is the Flutter code that calls Firestore:
final sourceRef=FirebaseFirestore.instance.collection("source");
var data=await sourceRef.doc("stats").get();
What am I missing please?
According to Firebase pricing, writes are defined as:
You are charged for each document read, write, and delete that you perform with Cloud Firestore.
Charges for writes and deletes are straightforward. For writes, each set or update operation counts as a single write.
Meaning that one document created is one write. If the same document is updated later, then Firebase counts it as one more write.
Here is a more detailed table that you can use for billing, and an example.
It is recommended to view individual product usage in the "Usage" tab for many products in the Firebase console, as this can narrow the product that is causing the elevated usage that you are seeing.
I would highly recommend adding write and view logs to your application; that way, you can monitor how many writes and reads you have.

Firestore listener updates while large snapshot.get() processing

I am looking for some advice on an approach. I am using firebase firestore. I am using batch and transaction updates (depending on situation) to keep things as atomic as possible .
My application has many transactions (10's and possibly 100's of thousands). On the user's dashboard I loop through and total some of the fields in those transactions by looping ALL transactions and totaling by user, team and challenge. Clearly, this is too costly to do each time a user hits the dashboard. I have tried a few approaches and currently doing the following.
I store all totals in a collection. i.e. overall totals, team totals, user totals. These are the only items I need to fetch to show dashboard so its very minimal compared to all transactions.
I am doing all this work using firebase functions.
If I find a collection of totals does NOT exists, I create it from scratch by looping through all transactions and when done, saving results/totals to the totals collection.
I then have a firebase function firestore listener that triggers whenever a transaction changes. Then I just update the totals based on the updated transaction so I don't need to read them all again.
The thing I worry about is that if I receive an update while the main loop/calculation is running I will lose the data for that new/updated transaction and it will cause the totals to get out of sync.
Does anyone have an idea of how to get around this issue? Should I try and use google tasks to queue things? Any ideas how to even keep track of if that listener is running? Sort of like a semaphore or something (which I don't think firebase functions support).
I apologize if this is too vague. I don't need code or anything like that, was just looking for some ideas on what would be a reasonable approach. I

does accessing the firebase firestore database dashboard will be considered as read operation?

I am now in developing phase for the project. currently the project only using one Android app as the frontend. the query from Android using limit and pagination. but the total number of documents read is way above the expected number.
I am trying to figure this out, why the number of read documents is so big even though the user is only one (me). I am scared the project will not be feasible if the number of read is so big. thats why i need to figure out the firestore read behaviour
When I accessed the firestore dashboard, and select a collection like the image below, it will show blue loading indicator and then show all documents available. currently in the event collection I have 52 documents. I access all documents in the event collection like this for several times for debugging purpose.
so whenever i tap that event collection, I assume it will be counted as 52 read operation, so the read operation will not only come from Android device but also from the dashboard ? thats why the number of reads is so big. am I right ?
if thats the case....
say if I have 100000 documents in event collection, then whenever i tap that event collection, will i perform 100000 read operation as well ? is there a way to limit this dashboard read ?
so the read operation will not only come from the Android device but also from the dashboard? That's why the number of reads is so big. am I right?
Yes, you are right.
say if I have 100000 documents in event collection, then whenever I tap that event collection, will I perform 100000 read operation as well?
No, you'll be charged only for the number of documents that belong to the first page. Inside the Console, there is a pagination mechanism especially implemented for that. So you'll be not charged for all the documents that exist in your collection.
Is there a way to limit this dashboard read?
The limitation already exists but be aware that as much as you scroll down, you get more documents which means more read operations charged.
One thing to bear in mind about the Firebase console is that it reflects changes to visible documents in real time, and each one of those changes also costs you a read. So, if you leave the console open while documents are changing, you will accumulate reads over time, even if you aren't actively using the console. This is a common source of unexpected reads.

Determining number of Firebase reads for nested sub-collection

I have a mobile solution (iOS) that is using Firebase to aid in syncing of data between a users devices. What I have works and allows me to keep clients in sync as I wanted to. However from testing, my reads are a bit out of control for larger data sets and I need to do some optimization. To that end, I wanted to make sure that my understanding of how reads are counted was correct (I am still a newbie at Firebase).
My data is structured like this:
Its a bit nested I agree, but for all the uses cases it seems to be the best way to do things to minimize redundancy, e.g. there are relationship between Cats and Dogs and Birds, but I only store one copy of each, not multiple. In addition, each users data is segregated from the other users and I need the ability to version the data. Put that all together and with the requirement to alternate collections and documents, you get what you see.
Based on this structure, I can create queries like this:
Firestore.firestore().collection("userid1").document("data").collection("version0").document("Cats").collection("data").whereField("modifiedDate" isGreaterThanOrEqualTo: someDoubleValue).getDocuments(completionCallback)
This gets me the data I need and seems to only return the number of items I think it should. However, am I correct in saying that if there are 100 Cat type documents (Cat1...Cat100), but only 3 of them have a modifiedDate that is greater than my query parameter, when the data is returned to me, I will only be "charged" for 3 reads? Or have I don't something completely silly here and I am getting charged for all 100 even though I only get 3 documents back in the callback.
The billing doesn't work any different for subcollections than it does for top-level collections. You are only billed for the documents transferred, not the entire set of documents in the collection (unless you do request every document).
Cloud Firestore scales massively, and it's expected that you might have a massive number of documents in a collection. Billing a read for each and every document in a collection for each query against that collection would be insanely expensive.

Resources