I am making an app and I am trying to figure out why using nested collections is frowned upon by Firestore. The app is a expense tracking app and the data is only relevant to the logged in user and that user never cares about any other user. There are two ways that I have found to structure the data. One uses a few more levels of nesting than the other. The following structures mean:
collectionName: valueNames
subcollectionName: valueName
Structure 1 (Not as nested):
user:
month: totalSpent, startDate, endDate
transactions: categoryId, amount, timestamp
categories: monthId, name, totalSpent
Structure 2 (More nested):
user:
month: totalSpent, name, startDate, endDate
categories: name, totalSpent
transactions: categoryName, amount, timestamp
Can someone tell me the advantages of structure 1 as opposed to structure 2? Considering structure 2 seems to be easier to query and I do not have to keep track of multiple id's I can just get the sub collection. This would also make it easier to track previous months to show the user later when they want to analyze their spending.
Structure 1 allows you to view transactions and categories across multiple months. You cannot query across subcollections (see Is it possible to query multiple document sub-collections in Cloud Firestore?) and so with Structure 2 you would not be able to query all transactions across months or categories.
Explained
With Structure 2 you would need to query the months first, then pick a single month and query the categories within that month, then pick a category (or iterate over each one) and query for the transactions in that category. To aggregate category spending for the year you would need to make 12 calls, one for each month.
With Structure 1 you could query all transactions, limit by date range, limit by category, or a combination of the above. You could query all categories for the year in one go to sum the values for a year overview. Structure 1 gives you a lot more performant queries.
Summary
Remember, Firestore is not like Firebase Realtime Database where you can select all the data in a given tree structure at once. You will need to make a query at each level of the tree (each collection) to pull data.
There is nothing wrong in creating those collections as long as you remember to delete them with the actual document. In short deleting a document does not delete the subcollection it contains. This is how it works:
Each document in cloud firestore contains reference or path to the subcollections within it (not the whole subcollection), so when you delete a document, every field gets deleted including the field that stored the reference. Mean while the actual subcollection now lies in the form of garbage which you cannot access because you deleted it's path reference.
Subcollections are actually an improvement over the natural json data flow, but since cloud firestore is in beta version, some of the features (like deleting the subcollections along with documents) may be released when it graduated from beta or later on.
Main advantage of using subcollections is that you save user's data because when you query the document data, subcollection data is not fetched ie. queries are shallow.
Related
As the title suggests I would like to know how to get the total elements count of a paginated and filtered collection.
I have seen that many recommend, for the counting of the documents of the collection, to create a statistics document with the counter of the documents in the collection.
But if I need to implement a paged and filtered retrieval, how can I have the count of the total filtered items without having to retrieve them all?
Edit: October 20th, 2022
Starting from now, counting the documents in a collection or the documents that are returned by a query is actually possible without the need for keeping a counter. So you can count the documents using the new count() method which:
Returns a query that counts the documents in the result set of this query.
This new feature was announced at this year's Firebase summit. Keep in mind that this feature doesn't read the actual documents. So according to the [official documentation][2]:
For aggregation queries such as count(), you are charged one document read for each batch of up to 1000 index entries matched by the query. For aggregation queries that match 0 index entries, there is a minimum charge of one document read.
For example, count() operations that match between 0 and 1000 index entries are billed for one document read. For A count() operation that matches 1500 index entries, you are billed 2 document reads.
I have seen that many recommend, for the counting of the documents of the collection, creating a statistics document with the counter of the documents in the collection.
Yes, that is correct. It's very costly to count the number of documents within a collection, each time you need that total number. So it's best to have a field in a document that contains that number and increment it each time a new document is added and decrement it each time a document is deleted.
But if I need to implement a paged and filtered retrieval, how can I have the count of the total filtered items without having to retrieve them all?
There is no way you can know ahead of time, how many documents exist in a collection without reading them all or reading a document that contains that information, as explained above.
The pagination in NoSQL databases is a little different than in SQL databases. In all modern applications, we paginate the data using an infinite scroll. If you understand Java, then you can take a look at my answer in the following post:
How to paginate Firestore with Android?
Here is also the official documentation regarding Firestore pagination that can be achieved using query cursors:
https://firebase.google.com/docs/firestore/query-data/query-cursors
If you understand Kotlin, I also recommend you check the following resource:
How to implement pagination in Firestore using Jetpack Compose?
I'm building an app which works like this: the user of the app is the manager of a team, he/she asks some questions to the team and collects the data in the app. Monthly, a report is generated by using this data. There is no use case/scenario where user will need to see all data at once, i.e. not filtered by month.
That being said, I thought about modelling the data this way:
- persons/{personId}:
- name
- answersByPerson/{personId}:
- personName
- byMonth/{YYYYMM}: (using month as key)
- month
- collectedAnswers/{uuid}:
- answer_to_q1 ... (these are all yes or no questions)
_ answer_to_qn
- aggregationsByPerson/{personId}: (this should be computed by cloud function)
- month
- byMonth/{YYYYMM}: (also using month as key)
- sum_q1... (count amount answered with 'yes')
- sum_qn
- reportByPerson/{personId}:
- personName
- month
- score (computed from aggregations)
So I have these questions:
Is it bad for me to use year/month as keys to my documents? (I'd make sure in my app to overwrite data if the key exists)
Is it bad for me to reuse the personId as keys in answersByPerson collection? The idea is that I wouldn't have to fetch the persons collection, nor filter the answer collection by personId.
Is it overengineering for me to use monthly buckets? I thought that maybe I'd save some money if I fetched collection('answersByPerson').doc('$personId').collection($month) instead of fetching collection('answersByPerson').doc('$personId').where(...).
Also, would it make sense for me to put the aggregations inside the answers collection? Would I be able to updated it without using a cloud function, or could this lead to issues with synchronization?
edit: I've searched about this and it seems that the term "bucketing" is not that common, I've taken it from this article.
Firestore charges for the number of documents read, and the bandwidth consumed; it does explicitly not charge for the number of documents it has to search through. If you can write a query to get exactly the documents you need from the combined collection, then the cost will be exactly the same between these two operations. More uniquely: so will the performance, as Firestore's performance depends only on the amount of data you retrieve and not on the size of the collection.
I have two Firestore collections, Users and Posts. Below are simplified examples of what the typical document in each contains.
*Note that the document IDs in the friends subcollection are equal to the document ID of the corresponding user documents. Optionally, I could also add a uid field to the friends documents and/or the Users documents. Also, there is a reason not relevant to this question that we have friends as a subcollection to each user, but if need-be we change it into a unified root-level Friends collection.
This setup makes it very easy to query for posts, sorted chronologically, by any given user by simply looking for Posts documents whose owner field is equal to the document reference of that user.
I achieve this in iOS/Swift with the following, though we are building this app for iOS, Android, and web.
guard let uid = Auth.auth().currentUser?.uid else {
print("No UID")
return
}
let firestoreUserRef = firestore.collection("Users").document(uid)
firestorePostsQuery = firestore.collection("Posts").whereField("owner", isEqualTo: firestoreUserRef).order(by: "timestamp", descending: true).limit(to: 25)
My question is how to query Posts documents that have owner values contained in the user's friends subcollection, sorted chronologically. In other words, how to get the posts belonging to the user's friends, sorted chronologically.
For a real-world example, consider Twitter, where a given user's feed is populated by all tweets that have an owner property whose value is contained in the user's following list, sorted chronologically.
Now, I know from the documentation that Firestore does not support logical OR queries, so I can't just chain all of the friends together. Even if I could, that doesn't really seem like an optimal approach for anyone with more than a small handful of friends.
The only option I can think of is to create a separate query for each friend. There are several problems with this, however. The first being the challenges presenting (in a smooth manner) the results from many asynchronous fetches. The second being that I can't merge the data into chronological order without re-sorting the set manually on the client every time one of the query snapshots is updated (i.e., real-time update).
Is it possible to build the query I am describing, or am I going to have to go this less-than optimal approach? This seems like a fairly common query use-case, so I'll be surprised if there is not a way to do this.
The sort chronologically is easy provided you are using a Unix timestamp, e.g. 1547608677790 using the .orderBy method. However, that leaves you with a potential mountain of queries to iterate through (one per friend).
So, I think you want to re-think the data store schema.
Take advantage of Cloud Functions for Firebase Triggers. When a new post is written, have a cloud function calculate who all should see it. Each user could have an array-type property containing all unread-posts, read-posts, etc.
Something like that would be fast and least taxing.
I have a Firestore database set up where I have a Users collection and Animals collection (these animals can be created by users at any time). For a particular user, I want to grab a random animal document that the user hasn't seen yet.
I don't believe it is possible to query for non-existing keys in Firestore, which makes this problem non-trivial for me. Is there a better way to do this than to have a dictionary of all animal ids for each user? My issue with that approach would be scalability since animals can be created by users and thus every user's animal dictionary would have to be updated per new animal.
Thanks for any help in advance!
You won't be able to do this with a single query that returns a single document.
There is no sense of randomness in Firestore queries. If you want something random, you'll have to select that in your code from a set of items in memory. This means, at the very least, you're going to have to first figure out how to query for all the animals a user hasn't seen yet, then select randomly from that set in application code.
You are correct in assuming that you'll need some sort of collection that records who has seen what animal, then query that for a list of unseen animals. Then you can randomly select from that the final animal document.
An in order to do that, you're going to need another collection of documents that record who has seen what animal previously.
We have an application that allows users to "follow" other users. When a user follows another, we register this data as a document within documentDB, like this:
{
"followerId": "userUUID",
"artistId": "artistUserUUID"
}
We now want to get a list of artists, ordered by the count of followers they have. So I am looking to somehow ask the DB to, based on these documents, give me back an array of artistUserUUId's, ordered by the amount of followers they have registered (as expressed in documents like the example given above).
Alternatively, we are also open to add an Array property to the document of the artistUser themselves, though even in this scenario I am still unsure how to do an ORDER BY based on the counting of a document's property (this property being an array of follower Ids).
I guess a workaround would be to add a stored procedure or trigger that will update a counter property within the artistUser document, but I'd like to validate if these is a way to implement this counting feature natively without such a trick.
Unless you denormalize the follower count into artist user documents (as you suggest), then you'll have to fetch every follower to accomplish your goal. Fetching every follower document, may or may not be prohibitive depending upon how many there are. If you fetch them only into a stored procedure rather than your actual client, it's conceptually no less efficient than an SQL GROUP_BY clause. Design your stored procedure to do the count and only returns the table of artist and counts. A robust implementation would incrementally update your output table in pages and be able to restart where it left off after a stored procedure timeout. Look at my countDocuments example stored procedure in documentdb-mock as well as my "Pattern for writing stored procedures" in the documentation for documentdb-utils for how I typically accomplish this.