what would be efficient alternative of "JOIN" in Firestore(NoSQL)? - firebase

I have users collection & transactions collection.
I need to get the user's balance by calculating his/her transactions.
And I heard that you are allowed to make duplicates and denormalize your database to achieve less document read in one request. (reading many docs cost more)
My approaches:
set transaction collection as a "subcollection" in the user document, so that you only get a user's documentation and compute the values need on the client-side.
make those collections as TOP level collections separately and somehow make "JOIN" queries to get his/her transactions then compute the value on the client-side.
Just make a field named "balance" in the user's document and update it every time they make transactions. (But this seems not quite adaptable to changes that might be made in the future)
Which approach is efficient? Or Maybe are there totally different approaches?

Which approach is efficient?
The third one.
Or Maybe are there totally different approaches?
Of course, there are, but by far the third is the best and cheapest one. Every time a new transaction is performed simply increment the "balance" field using:
What is the recommended way of saving durations in Firestore?

Related

Firestore data model for events planning app

I am new to Firestore and building an event planning app but I am unsure what the best way to structure the data is taking into account the speed of queries and Firestore costs based on reads etc. In both options I can think of, I have a users collection and an events collection
Option 1:
In the users collection, each user has an array of eventIds for events they are hosting and also events they are attending. Then I query the events collection for those eventIds of that user so I can list the appropriate events to the user
Option 2:
For each event in the events collection, there is a hostId and an array of attendeeIds. So I would query the events collection for events where the hostID === user.id and where attendeeIds.includes(user.id)
I am trying to figure out which is best from a performance and a costs perspective taking into account there could be thousands of events to iterate through. Is it better to search events collections by an eventId as it will stop iterating when all events are found or is that slow since it will be searching for one eventId at a time? Maybe there is a better way to do this than I haven't mentioned above. Would really appreciate the feedback.
In addition to #Dharmaraj answer, please note that none of the solutions is better than the other in terms of performance. In Firestore, the query performance depends on the number of documents you request (read) and not on the number of documents you are searching. It doesn't really matter if you search 10 documents in a collection of 100 documents or in a collection that contains 100 million documents, the response time will always be the same.
From a billing perspective, yes, the first solution will imply an additional document to read, since you first need to actually read the user document. However, reading the array and getting all the corresponding events will also be very fast.
Please bear in mind, that in the NoSQL world, we are always structuring a database according to the queries that we intend to perform. So if a query returns the documents that you're interested in, and produces the fewest reads, then that's the solution you should go ahead with. Also remember, that you'll always have to pay a number of reads that is equal to the number of documents the query returns.
Regarding security, both solutions can be secured relatively easily. Now it's up to you to decide which one works better for your use case.
I would recommend going with option 2 because it might save you some reads:
You won't have to query the user's document in the first place and then run another query like where(documentId(), "in", [...userEvents]) or fetch each of them individually if you have many.
When trying to write security rules, you can directly check if an event belongs to the user trying to update the event by resource.data.hostId == request.auth.uid.
When using the first option, you'll have to query the user's document in security rules to check if this eventID is present in that events array (that may cost you another read). Checkout the documentation for more information on billing.

Firestore evergrowing collection

I'm working on an app where users create certain events in a calendar.
I was thinking on structuring the calendar events data as follows:
allEventsEver/{yearId}/months/{monthId}/events/{eventId}
I understand that
Firestore is optimized for storing large collections of small documents
but the structure above would mean that this would be an ever-growing collection. Is this something I should worry about? Would it be better to create a new collection for each year, e.g.:
2022/months/{monthId}/events/{eventId}
2023/months/{monthId}/events/{eventId}
Also, should I avoid using year/month value as document id (e.g. 2022) as those would be considered sequential ids that could cause hotspots that impact latency? If yes, what other approach do you suggest?
The most important/unique performance guarantee Firestore gives is that its query performance is independent of the number of documents in the collection. Query performance only depends on how much data you return, not on how much data needs to be considered.
So an ever-growing collection is not a concern on Firestore. As long put a limit on how many results your query can return, you'll have an upper bound on how much time it will take.

Increment counter across multiple subcollections

I am structuring the app where users can increment an attendee counter when they click "go" on an event. To simplify reads events are duplicated in "events" subcollections across users that follow a particular organizer. Every time a user clicks "go" on an event this counter has to be incremented across all subcollections. Is making a join a better alternative than denormalizing data in this case?
There is no "right" or "wrong" when it comes to modeling. If your app has its needs satisfied by what you've implemented, combined with knowledge of Firestore's limitations, do whatever you want.
But if your goal is to avoid 10000 writes by performing an extra lookup, then by all means, save yourself that trouble and cost. You might want to do a cost or performance estimate of both approaches, and choose the one that's cheaper or faster, or whatever you need out of this solution.
Bear in mind that there is a limitation of 500 documents in a single transaction, so if you need to update 10000 documents, you won't be able to do so atomically. It's possible that some documents simply might not get the update if your code isn't resilient to errors or incomplete work. That could be enough reason not to fan out a write to so many documents.

Write millions of documents into Riak

What is the best way to add huge amount of documents into riak? Let's say there are millions of product records, which change very often (prices, ...) and we want to update all of them very frequently. Is there a better way than replace keys one by one in Riak? Something as bulk set of 1000 documents at once...
There are unfortunately not any bulk operations available in Riak, so this has to be done by updating each object individually. If your updates however arrive in bulks, it may be worthwhile revisiting your data model. If you can de-normalise your products, perhaps by storing a range of products in a single object, it might be possible to reduce the number of updates that need to be performed by grouping them, thereby reducing the load on the cluster.
When modelling data in Riak you usually need to look at access and query patterns in addition to the structure of the data, and make sure that the model supports all types of queries and latency requirements. This quite often means de-normalising your model by either grouping or duplicating data in order to ensure that updates and queries can be performed as efficiently as possible, ideally through direct K/V access.

Mongodb automatically write into capped collection

I need to manage the acquisition of many record at hour. About 1000000 records. And I need to get every second the last insert value for every primary key. It works quit well with sharding. I was thinking to try the use os capped collection to get only the last record for every primary key. In order to do this, I made two separated insert, there is a way, into mongodb, to make some kind of trigger to propagate the insert into a collection to another collection?
MongoDB does not have any support for triggers or similar behavior.
The only way to do this is to make it happen in your code. So the code that writes the first entry should also write the second.
People have definitely requested triggers. If they are necessary for your solution, please cast a vote on the feature request.
I disagree with "triggers is needed". People, MongoDB was created to be very fast and to provide as basic functionalities as can be. This is a power of this solution.
I think that here the best think is to create triggers inside Your application as a part of Data Access layer.

Resources