Resolve FK in firestore - firebase

I have some documents in firestore have some fields in it. like collection "details" looks like this
{
id: "",
fields1: "",
userFK: Reference to users collection
}
Now I need to resolve userFK on the fly means that I don't want first fetch all the documents then query to userFk userFK.get()
Is there any method, its like doing a $lookup whick is supported in mongodb
Even In some case I want to fetch documents from "details" collection based of some specific fields in users

There is no way to get documents of multiple types from Firestore with a single read operation. To get the user document referenced by userFK you will have to perform a separate read operation.
This is normal when using NoSQL databases like Cloud Firestore, as they typically don't support any server-side equivalent of a SQL JOIN statement. The performance of loading these additional details is not as bad as you may think though, so be sure to measure how long it takes for your use-case before writing it off as not feasible.
If this additional load is prohibitive for a scenario, an alternative is to duplicate the necessary data of the user into each details document. So instead of only storing the reference to their document, you'd for example also store the user name.
This puts more work on the write operation, but makes the read operations simpler and more scalable. This is the common trade-off of space vs time, where in NoSQL databases you'll often find yourself trading time for space: so storing duplicate data.
If you're new to NoSQL data modeling, I highly recommend:
NoSQL data modeling
Getting to know Cloud Firestore

Related

What is the best way to get multiple specific data from collections in firestore?

is there any better way to get multiple specific data from collection in firestore?
Let's say have this collection:
--Feeds (collection)
--feedA (doc)
--comments (collection)
--commentA (doc)
users_in_conversation: [abcdefg, hijklmn, ...] //Field contains list of all user in conversation
Then, I'll need to retrieve the user data (name and avatar) from the Users collection, currently, I did 1 query per user, but it will be slow when there are many people in conversation.
What's the best way to retrieve specific users?
Thanks!
Retrieving the additional names is actually a lot faster than most developers expect, as the requests can often be pipelined over a single HTTP/2 connection. But if you're noticing performance problems, edit your question to show the code you use, the data you have, and the performance you're getting.
A common way to reduce the need to load additional documents is by duplicating data. For example, if you store the name and avatar of the user in each comment document, you won't need to look up the user profile every time you read a comment.
If you come from a background in relational databases, this sort of data duplication may be very unexpected. But it's actually quite common in NoSQL databases.
You will of course then have to consider how to deal with updates to the user profile, for which I recommend reading: How to write denormalized data in Firebase While this is for Firebase's other database, the same concepts apply to Firebase. I also in general recommend watching Getting to know Cloud Firestore.
I have tried some solution, but I think this solution is the best for the case:
When a user posts a comment, write a field of array named discussions in the user document containing the feed/post id.
When user load on a feed/post, get all user data which have its id in the user discussions (using array-contains)
it’s efficient and costs fewer transaction processes.

Using both Firebase Realtime Database and Firestore with same ID

Like the title suggests, I have a use case where I will write data to both firestore and realtime database. I am using the realtime database for operations that require live feedback to users and firestore to store data that will not really change but can be queried for more complex operations later on.
Due to my need of both databases, I would like to use the same UID when creating data in both databases to make it easy to retrieve in the future. The issue I have is determining which generated ID will satisfy the other service.
My thought process is since Realtime Database push ID is based on timestamp, it could create hot partitions for Firestore so indexing performance as data grows could get hurt in the future if I used the same ID there. But if I use firestore's generated ID in the realtime database, I will not have the data in the sorted fashion that realtime database creates pushed data.
I was wondering what solutions people used to tackle this use case and what options are available to me. Thanks!
If you need to order data, then simply store timestamps as fields instead of depending on the time-based sort order of Realtime Database push IDs. You can do this easily in both databases. Firestore makes obsolete the idea that unique IDs have any meaning other than simply being unique.
If you make sure your unique ID's are truly random like Firestore's, then you won't have any problems with indexing or writing documents.

Managing Denormalized/Duplicated Data in Cloud Firestore

If you have decided to denormalize/duplicate your data in Firestore to optimize for reads, what patterns (if any) are generally used to keep track of the duplicated data so that they can be updated correctly to avoid inconsistent data?
As an example, if I have a feature like a Pinterest Board where any user on the platform can pin my post to their own board, how would you go about keeping track of the duplicated data in many locations?
What about creating a relational-like table for each unique location that the data can exist that is used to reconstruct the paths that require updating.
For example, creating a users_posts_boards collection that is firstly a collection of userIDs with a sub-collection of postIDs that finally has another sub-collection of boardIDs with a boardOwnerID. Then you use those to reconstruct the paths of the duplicated data for a post (eg. /users/[boardOwnerID]/boards/[boardID]/posts/[postID])?
Also if posts can additionally be shared to groups and lists would you continue to make users_posts_groups and users_posts_lists collections and sub-collections to track duplicated data in the same way?
Alternatively, would you instead have a posts_denormalization_tracker that is just a collection of unique postIDs that includes a sub-collection of locations that the post has been duplicated to?
{
postID: 'someID',
locations: ( <---- collection
"path/to/post/location1",
"path/to/post/location2",
...
)
}
This would mean that you would basically need to have all writes to Firestore done through Cloud Functions that can keep a track of this data for security reasons....unless Firestore security rules are sufficiently powerful to allow add operations to the /posts_denormalization_tracker/[postID]/locations sub-collection without allowing reads or updates to the sub-collection or the parent postIDs collection.
I'm basically looking for a sane way to track heavily denormalized data.
Edit: oh yeah, another great example would be the post author's profile information being embedded in every post. Imagine the hellscape trying to keep all that up-to-date as it is shared across a platform and then a user updates their profile.
I'm aswering this question because of your request from here.
When you are duplicating data, there is one thing that need to keep in mind. In the same way you are adding data, you need to maintain it. With other words, if you want to update/detele an object, you need to do it in every place that it exists.
What patterns (if any) are generally used to keep track of the duplicated data so that they can be updated correctly to avoid inconsistent data?
To keep track of all operations that we need to do in order to have consistent data, we add all operations to a batch. You can add one or more update operations on different references, as well as delete or add operations. For that please see:
How to do a bulk update in Firestore
What about creating a relational-like table for each unique location that the data can exist that is used to reconstruct the paths that require updating.
In my opinion there is no need to add an extra "relational-like table" but if you feel confortable with it, go ahead and use it.
Then you use those to reconstruct the paths of the duplicated data for a post (eg. /users/[boardOwnerID]/boards/[boardID]/posts/[postID])?
Yes, you need to pass to each document() method, the corresponding document id in order to make the update operation work. Unfortunately, there are no wildcards in Cloud Firestore paths to documents. You have to identify the documents by their ids.
Alternatively, would you instead have a posts_denormalization_tracker that is just a collection of unique postIDs that includes a sub-collection of locations that the post has been duplicated to?
I consider that isn't also necessary since it require extra read operations. Since everything in Firestore is about the number of read and writes, I think you should think again about this approach. Please see Firestore usage and limits.
unless Firestore security rules are sufficiently powerful to allow add operations to the /posts_denormalization_tracker/[postID]/locations sub-collection without allowing reads or updates to the sub-collection or the parent postIDs collection.
Firestore security rules are so powerful to do that. You can also allow to read or write or even apply security rules regarding each CRUD operation you need.
I'm basically looking for a sane way to track heavily denormalized data.
The simplest way I can think of, is to add the operation in a datastructure of type key and value. Let's assume we have a map that looks like this:
Map<Object, DocumentRefence> map = new HashMap<>();
map.put(customObject1, reference1);
map.put(customObject2, reference2);
map.put(customObject3, reference3);
//And so on
Iterate throught the map, and add all those keys and values to batch, commit the batch and that's it.

Firestore database model for Notion-like modules [duplicate]

I have seen videos and read the documentation of Cloud firestore, from Google Firebase service, but I can't figure this out coming from realtime database.
I have this web app in mind in which I want to store my providers from different category of products. I want perform a search query through all my products to find what providers I have for such product, and eventually access that provider info.
I am planning to use this structure for this purpose:
Providers ( Collection )
Provider 1 ( Document )
Name
City
Categories
Provider 2
Name
City
Products ( Collection )
Product 1 ( Document )
Name
Description
Category
Provider ID
Product 2
Name
Description
Category
Provider ID
So my question is, is this approach the right way to access the provider info once I get the product I want?
I know this is possible in the realtime database, using the provider ID I could search for that provider in the providers section, but with Firestore I am not sure if its possible or if this is right approach.
What is the correct way to structure this kind of data in Firestore?
You need to know that there is no "perfect", "the best" or "the correct" solution for structuring a Cloud Firestore database. The best and correct solution is the solution that fits your needs and makes your job easier. Bear also in mind that there is also no single "correct data structure" in the world of NoSQL databases. All data is modeled to allow the use-cases that your app requires. This means that what works for one app, may be insufficient for another app. So there is not a correct solution for everyone. An effective structure for a NoSQL type database is entirely dependent on how you intend to query it.
The way you are structuring your data looks good to me. In general, there are two ways in which you can achieve the same thing. The first one would be to keep a reference of the provider in the product object (as you already do) or to copy the entire provider object within the product document. This last technique is called denormalization and is a quite common practice when it comes to Firebase. So we often duplicate data in NoSQL databases, to suit queries that may not be possible otherwise. For a better understanding, I recommend you see this video, Denormalization is normal with the Firebase Database. It's for Firebase Realtime Database but the same principles apply to Cloud Firestore.
Also, when you are duplicating data, there is one thing that needs to keep in mind. In the same way, you are adding data, you need to maintain it. In other words, if you want to update/delete a provider object, you need to do it in every place that it exists.
You might wonder now, which technique is best. In a very general sense, the best way in which you can store references or duplicate data in a NoSQL database is completely dependent on your project's requirements.
So you should ask yourself some questions about the data you want to duplicate or simply keep it as references:
Is the static or will it change over time?
If it does, do you need to update every duplicated instance of the data so they all stay in sync? This is what I have also mentioned earlier.
When it comes to Firestore, are you optimizing for performance or cost?
If your duplicated data needs to change and stay in sync in the same time, then you might have a hard time in the future keeping all those duplicates up to date. This will also might imply you spend a lot of money keeping all those documents fresh, as it will require a read and write for each document for each change. In this case, holding only references will be the winning variant.
In this kind of approach, you write very little duplicated data (pretty much just the Provider ID). So that means that your code for writing this data is going to be quite simple and quite fast. But when reading the data, you will need to load the data from both collections, which means an extra database call. This typically isn't a big performance issue for reasonable numbers of documents, but definitely does require more code and more API calls.
If you need your queries to be very fast, you may want to prefer to duplicate more data so that the client only has to read one document per item queried, rather than multiple documents. But you may also be able to depend on local client caches makes this cheaper, depending on the data the client has to read.
In this approach, you duplicate all data for a provider for each product document. This means that the code to write this data is more complex, and you're definitely storing more data, one more provider object for each product document. And you'll need to figure out if and how to keep up to date on each document. But on the other hand, reading a product document now gives you all information about the provider document in one read.
This is a common consideration in NoSQL databases: you'll often have to consider write performance and disk storage vs. reading performance and scalability.
For your choice of whether or not to duplicate some data, it is highly dependent on your data and its characteristics. You will have to think that through on a case-by-case basis.
So in the end, remember that both are valid approaches, and neither of them is pertinently better than the other. It all depends on what your use-cases are and how comfortable you are with this new technique of duplicating data. Data duplication is the key to faster reads, not just in Cloud Firestore or Firebase Realtime Database but in general. Any time you add the same data to a different location, you're duplicating data in favor of faster read performance. Unfortunately in return, you have a more complex update and higher storage/memory usage. But you need to note that extra calls in Firebase real-time database, are not expensive, in Firestore are. How much duplication data versus extra database calls is optimal for you, depends on your needs and your willingness to let go of the "Single Point of Definition mindset", which can be called very subjective.
After finishing a few Firebase projects, I find that my reading code gets drastically simpler if I duplicate data. But of course, the writing code gets more complex at the same time. It's a trade-off between these two and your needs that determines the optimal solution for your app. Furthermore, to be even more precise you can also measure what is happening in your app using the existing tools and decide accordingly. I know that is not a concrete recommendation but that's software development. Everything is about measuring things.
Remember also, that some database structures are easier to be protected with some security rules. So try to find a schema that can be easily secured using Cloud Firestore Security Rules.
Please also take a look at my answer from this post where I have explained more about collections, maps and arrays in Firestore.

How to design a Cloud Firestore database schema

Migrating from realtime database to cloud firestore needs a total redesign of the database. For this I created an example with some main design decisions.
See picture and the database design in the spreadsheet below.
My two questions are:
1 - when I have a one to many relation is it also an option to store information as an array within the document? See line 8 in database design.
2 - Should I include only a reference, or duplicate all information in the one to many relation. See line 38 in the database model.
https://docs.google.com/spreadsheets/d/13KtzSwR67-6TQ3V9X73HGsI2EQDG9FA8WMN9CCHKq48/edit?usp=sharing
In general: keep the data store as shallow as possible, i.e., avoid subcollections and nesting.
Data can be related one-to-one, one-to-many, or many-to-many. Firestore is an automatically indexed realtime datastore. Firestore is often subscribed to rather than just a one time query/response (the realtime nature of the system).
Regarding the Firestore data model, always consider How will I query this data store?. Use subcollections, arrays, and maps sparingly (rarely) and only if you must (and you most likely don't need to). Use auto-id's vs human readable id's, e.g. use 000kztLDGafF4uKb8Cal rather than banana for document ID's.
As app functionality increases, server-side scripting with Cloud Functions for Firebase and/or the Admin SDK becomes an invaluable tool for managing (creating and indexing) many-to-many data relationships. For example, full-text search is not supported in Firestore. This boils down to what seems like a barrier to implementing robust search functionality on your app.
In conclusion, try and avoid subcollections, nesting, arrays, and maps. Follow the keep it simple stupid, KISS, principle. Once your app scales up and/or requires more functionality, server-side scripting can be utilized to to keep your app responsive (fast) while offering robust features.
For Question 1 there's a solution in the firestore docs:
https://cloud.google.com/firestore/docs/solutions/arrays
instead of using an array you use a map of values and set them to 'true' which allows you to query for them, like so:
teachers: {
"teacherid1": true,
"teacherid2": true,
"teacherid3": true
}
And for Question 2, you just need to save the teacher-ids because if you have those you can easily query for the corresponding data.

Resources