Increment counter across multiple subcollections - firebase

I am structuring the app where users can increment an attendee counter when they click "go" on an event. To simplify reads events are duplicated in "events" subcollections across users that follow a particular organizer. Every time a user clicks "go" on an event this counter has to be incremented across all subcollections. Is making a join a better alternative than denormalizing data in this case?

There is no "right" or "wrong" when it comes to modeling. If your app has its needs satisfied by what you've implemented, combined with knowledge of Firestore's limitations, do whatever you want.
But if your goal is to avoid 10000 writes by performing an extra lookup, then by all means, save yourself that trouble and cost. You might want to do a cost or performance estimate of both approaches, and choose the one that's cheaper or faster, or whatever you need out of this solution.
Bear in mind that there is a limitation of 500 documents in a single transaction, so if you need to update 10000 documents, you won't be able to do so atomically. It's possible that some documents simply might not get the update if your code isn't resilient to errors or incomplete work. That could be enough reason not to fan out a write to so many documents.

Related

Firestore data model for events planning app

I am new to Firestore and building an event planning app but I am unsure what the best way to structure the data is taking into account the speed of queries and Firestore costs based on reads etc. In both options I can think of, I have a users collection and an events collection
Option 1:
In the users collection, each user has an array of eventIds for events they are hosting and also events they are attending. Then I query the events collection for those eventIds of that user so I can list the appropriate events to the user
Option 2:
For each event in the events collection, there is a hostId and an array of attendeeIds. So I would query the events collection for events where the hostID === user.id and where attendeeIds.includes(user.id)
I am trying to figure out which is best from a performance and a costs perspective taking into account there could be thousands of events to iterate through. Is it better to search events collections by an eventId as it will stop iterating when all events are found or is that slow since it will be searching for one eventId at a time? Maybe there is a better way to do this than I haven't mentioned above. Would really appreciate the feedback.
In addition to #Dharmaraj answer, please note that none of the solutions is better than the other in terms of performance. In Firestore, the query performance depends on the number of documents you request (read) and not on the number of documents you are searching. It doesn't really matter if you search 10 documents in a collection of 100 documents or in a collection that contains 100 million documents, the response time will always be the same.
From a billing perspective, yes, the first solution will imply an additional document to read, since you first need to actually read the user document. However, reading the array and getting all the corresponding events will also be very fast.
Please bear in mind, that in the NoSQL world, we are always structuring a database according to the queries that we intend to perform. So if a query returns the documents that you're interested in, and produces the fewest reads, then that's the solution you should go ahead with. Also remember, that you'll always have to pay a number of reads that is equal to the number of documents the query returns.
Regarding security, both solutions can be secured relatively easily. Now it's up to you to decide which one works better for your use case.
I would recommend going with option 2 because it might save you some reads:
You won't have to query the user's document in the first place and then run another query like where(documentId(), "in", [...userEvents]) or fetch each of them individually if you have many.
When trying to write security rules, you can directly check if an event belongs to the user trying to update the event by resource.data.hostId == request.auth.uid.
When using the first option, you'll have to query the user's document in security rules to check if this eventID is present in that events array (that may cost you another read). Checkout the documentation for more information on billing.

what would be efficient alternative of "JOIN" in Firestore(NoSQL)?

I have users collection & transactions collection.
I need to get the user's balance by calculating his/her transactions.
And I heard that you are allowed to make duplicates and denormalize your database to achieve less document read in one request. (reading many docs cost more)
My approaches:
set transaction collection as a "subcollection" in the user document, so that you only get a user's documentation and compute the values need on the client-side.
make those collections as TOP level collections separately and somehow make "JOIN" queries to get his/her transactions then compute the value on the client-side.
Just make a field named "balance" in the user's document and update it every time they make transactions. (But this seems not quite adaptable to changes that might be made in the future)
Which approach is efficient? Or Maybe are there totally different approaches?
Which approach is efficient?
The third one.
Or Maybe are there totally different approaches?
Of course, there are, but by far the third is the best and cheapest one. Every time a new transaction is performed simply increment the "balance" field using:
What is the recommended way of saving durations in Firestore?

Firestore: Is it feasible to handle replies to a conversation thread in a property (map/array) instead of a subcollection?

I have a collection of "Chats", each of which has a number of "Threads" subcollections. (Think of Microsoft Teams, or Slack threads, or Discord threads)
Each "Thread" can have a number of replies, and I was wondering whether it was feasible to make these replies a property of the "Thread" document rather than using a "Replies" subcollection. According to the pricing table, this would reduce costs but from what I can see this would be a bad practice and testing it out has shown that it's harder to implement.
It is, just ensure that messages use a transaction to complete as multiple writes can collide in the wrong circumstance. Additionally, including a server timestamp to maintain the order and time of creation.
Using one document does potentially limit you to ~7000-10k lines of text.
Ultimately, there is no clean way of handling multiple sub comments, and you will have to use a service that best suits your needs. Personally, I prefer realtime database with a limit/orderby query and paginating comments.
Each "Thread" can have a number of replies, and I was wondering whether it was feasible to make these replies a property of the "Thread" document rather than using a "Replies" subcollection.
This solution will work as long as the size of the document is less than 1 MiB. In my opinion, storing the replies that correspond to a conversation into an array is the best option you can go ahead with. Since the replies are always strings, you can store a lot of them, maybe hundreds, or even thousands. If you think that a single conversation might get a number of replies that is bigger than the number that fits into a single document, then try to consider sharding them over multiple documents.
Find a predictable name for each document, maybe a date or a date and time frame and that's it. You can reduce the number of reads consistently. Imagine you have 2k replies to a conversation. Are you willing to pay 2k reads instead of only one or maybe two document reads?
Storing the replies in the Firebase Realtime Database is also an option to take into consideration, as it's more suited to such smaller pieces of information, as your replies are. And indeed it doesn't have the cost of document reads. However, it will be costly in terms of bandwidth if there will be lots of replies.
So in the end it's up to you to do the math and choose which option is cheaper. If you choose to have the conversations in Firestore and the replies in the Realtime Database, please note that both work really well together.

Firestore, Array vs documents list

Assuming a user has a thousand friends, but when calling a friend list on a specific screen, bringing in a thousand documents is expensive and time consuming. Even if pagination is performed, there will be a speed delay due to additional requests.
And according to the official documentation, you can put 1MB in documents, that is, about 1 million characters. However, what I worry about when using Arrays is that there will be situations where things get complicated in many ways.
Are there any exact standards?
You seem to be fully aware of the limitations of Firestore in this case. There is no new information that will help you here.
If you have a list of things to store in Firestore, and that list could exceed the max size of a document (1MB, as you correctly state), then you are going to have a problem. On the other hand, if you put all of those items in other documents, you're going to have to pay for all of those reads. That's the tradeoff -- you will have to decide which problem is worse. Typically, people choose to use separately document so they are limited by the size of a document. But that's your call to make.
You could try to shard that list across multiple documents somehow, but that will add much complexity to your code. Again, your call to make.

Firestore, atomic writes/updates on more than 500 documents

One of the main reason for using firestore batche writes is that they are atomic and ensure data consistency. However they have a limit of 500 operations. Considering a large application, one may have denormalized user data in more than 500 documents. So when a user updates any of his/her profile details, I have to update it in all those more than 500 documents while maintaining data consistency (atomic updates) at the same time.
An intuitive solution would be maintaining an array of batches, and keeping track of those which fail, and then retry the failed batches manually.
However I want to ask that:
1) If there are any best practices or some other more easy and reliable methods of achieving this, because considering the limit 500 operations per batch, most of the commercial apps have to face the same issue.
2) Also is there a more smart approach present out there than just denormalizing data, so that through "that smart approach", this whole issue of data consistency (as stated above) can be avoided in the first place.
An intuitive solution would be maintaining an array of batches, and keeping track of those which fail, and then retry the failed batches manually.
That's a viable solution that you can go ahead with.
1) If there are any best practices or some other more easy and reliable methods of achieving this, because considering the limit 500 operations per batch, most of the commercial apps have to face the same issue.
I can tell you what I do. I usually create a counter variable and increment its value every time I add an update operation to the batch. Then create an if statement and every time you increment the counter, check to see if it reached 500. At that time, commit the current batch, reset the counter and start a new batch, picking up where you left off. Do this till you finish all batch writes.
2) Also is there a more smart approach present out there than just denormalizing data, so that through "that smart approach", this whole issue of data consistency (as stated above) can be avoided in the first place.
The problem of the batch writes cannot be solved with the help of denormalization. Duplicating data, isn't a solution.

Resources