Managing Denormalized/Duplicated Data in Cloud Firestore

Managing Denormalized/Duplicated Data in Cloud Firestore - firebase

If you have decided to denormalize/duplicate your data in Firestore to optimize for reads, what patterns (if any) are generally used to keep track of the duplicated data so that they can be updated correctly to avoid inconsistent data?
As an example, if I have a feature like a Pinterest Board where any user on the platform can pin my post to their own board, how would you go about keeping track of the duplicated data in many locations?
What about creating a relational-like table for each unique location that the data can exist that is used to reconstruct the paths that require updating.
For example, creating a users_posts_boards collection that is firstly a collection of userIDs with a sub-collection of postIDs that finally has another sub-collection of boardIDs with a boardOwnerID. Then you use those to reconstruct the paths of the duplicated data for a post (eg. /users/[boardOwnerID]/boards/[boardID]/posts/[postID])?
Also if posts can additionally be shared to groups and lists would you continue to make users_posts_groups and users_posts_lists collections and sub-collections to track duplicated data in the same way?
Alternatively, would you instead have a posts_denormalization_tracker that is just a collection of unique postIDs that includes a sub-collection of locations that the post has been duplicated to?
{
postID: 'someID',
locations: ( <---- collection
"path/to/post/location1",
"path/to/post/location2",
...
)
}
This would mean that you would basically need to have all writes to Firestore done through Cloud Functions that can keep a track of this data for security reasons....unless Firestore security rules are sufficiently powerful to allow add operations to the /posts_denormalization_tracker/[postID]/locations sub-collection without allowing reads or updates to the sub-collection or the parent postIDs collection.
I'm basically looking for a sane way to track heavily denormalized data.
Edit: oh yeah, another great example would be the post author's profile information being embedded in every post. Imagine the hellscape trying to keep all that up-to-date as it is shared across a platform and then a user updates their profile.

I'm aswering this question because of your request from here.
When you are duplicating data, there is one thing that need to keep in mind. In the same way you are adding data, you need to maintain it. With other words, if you want to update/detele an object, you need to do it in every place that it exists.
What patterns (if any) are generally used to keep track of the duplicated data so that they can be updated correctly to avoid inconsistent data?
To keep track of all operations that we need to do in order to have consistent data, we add all operations to a batch. You can add one or more update operations on different references, as well as delete or add operations. For that please see:
How to do a bulk update in Firestore
What about creating a relational-like table for each unique location that the data can exist that is used to reconstruct the paths that require updating.
In my opinion there is no need to add an extra "relational-like table" but if you feel confortable with it, go ahead and use it.
Then you use those to reconstruct the paths of the duplicated data for a post (eg. /users/[boardOwnerID]/boards/[boardID]/posts/[postID])?
Yes, you need to pass to each document() method, the corresponding document id in order to make the update operation work. Unfortunately, there are no wildcards in Cloud Firestore paths to documents. You have to identify the documents by their ids.
Alternatively, would you instead have a posts_denormalization_tracker that is just a collection of unique postIDs that includes a sub-collection of locations that the post has been duplicated to?
I consider that isn't also necessary since it require extra read operations. Since everything in Firestore is about the number of read and writes, I think you should think again about this approach. Please see Firestore usage and limits.
unless Firestore security rules are sufficiently powerful to allow add operations to the /posts_denormalization_tracker/[postID]/locations sub-collection without allowing reads or updates to the sub-collection or the parent postIDs collection.
Firestore security rules are so powerful to do that. You can also allow to read or write or even apply security rules regarding each CRUD operation you need.
I'm basically looking for a sane way to track heavily denormalized data.
The simplest way I can think of, is to add the operation in a datastructure of type key and value. Let's assume we have a map that looks like this:
Map<Object, DocumentRefence> map = new HashMap<>();
map.put(customObject1, reference1);
map.put(customObject2, reference2);
map.put(customObject3, reference3);
//And so on
Iterate throught the map, and add all those keys and values to batch, commit the batch and that's it.

Related

Restrict specific object key values with authentication in Firestore

I have an object stored in the Firestore database. Among other keys, it has a userId of the user who created it. I now want to store an email address, which is a sensitive piece of info, in the object. However, I only want this email address to be retrieved by the logged in user whose userId is equal to the userId of the object. Is it possible to restrict this using Firebase rules? Or will I need to store that email address in a /private collection under the Firebase object, apply restrictive firebase rules, and then retrieve it using my server?

TL;DR: Firestore document reads are all or nothing. Meaning, you can't retrieve a partial object from Firestore. So there is no feature at rule level that will give you granularity to restrict access to a specific field. Best approach is to create a subcollection with the sensitive fields and apply rules to it.
Taken from the documentation:
Reads in Cloud Firestore are performed at the document level. You either retrieve the full document, or you retrieve nothing. There is no way to retrieve a partial document. It is impossible using security rules alone to prevent users from reading specific fields within a document.
We solved this in two very similar approaches:
As you suggested, you can move your fields to a /private collection and apply rules there. However, this approach caused some issues for us because the /private collection is completely dettached from the original doc. Solving references implied multiple queries and extra calls to FS.
The second option -which is what the Documentation suggests also, and IMHO a bit better- is to use a subcollection. Which is pretty much the same as a collection but it keeps a hierarchical relationship with the parent coll.
From the same docs:
If there are certain fields within a document that you want to keep hidden from some users, the best way would be to put them in a separate document. For instance, you might consider creating a document in a private subcollection
NOTE:
Those Docs also include a good step-by-step on how to create this kind of structure on FS, how to apply rules to them, and how to consume the collections in various languages

What is the best way to get multiple specific data from collections in firestore?

is there any better way to get multiple specific data from collection in firestore?
Let's say have this collection:
--Feeds (collection)
--feedA (doc)
--comments (collection)
--commentA (doc)
users_in_conversation: [abcdefg, hijklmn, ...] //Field contains list of all user in conversation
Then, I'll need to retrieve the user data (name and avatar) from the Users collection, currently, I did 1 query per user, but it will be slow when there are many people in conversation.
What's the best way to retrieve specific users?
Thanks!

Retrieving the additional names is actually a lot faster than most developers expect, as the requests can often be pipelined over a single HTTP/2 connection. But if you're noticing performance problems, edit your question to show the code you use, the data you have, and the performance you're getting.
A common way to reduce the need to load additional documents is by duplicating data. For example, if you store the name and avatar of the user in each comment document, you won't need to look up the user profile every time you read a comment.
If you come from a background in relational databases, this sort of data duplication may be very unexpected. But it's actually quite common in NoSQL databases.
You will of course then have to consider how to deal with updates to the user profile, for which I recommend reading: How to write denormalized data in Firebase While this is for Firebase's other database, the same concepts apply to Firebase. I also in general recommend watching Getting to know Cloud Firestore.

I have tried some solution, but I think this solution is the best for the case:
When a user posts a comment, write a field of array named discussions in the user document containing the feed/post id.
When user load on a feed/post, get all user data which have its id in the user discussions (using array-contains)
it’s efficient and costs fewer transaction processes.

Resolve FK in firestore

I have some documents in firestore have some fields in it. like collection "details" looks like this
{
id: "",
fields1: "",
userFK: Reference to users collection
}
Now I need to resolve userFK on the fly means that I don't want first fetch all the documents then query to userFk userFK.get()
Is there any method, its like doing a $lookup whick is supported in mongodb
Even In some case I want to fetch documents from "details" collection based of some specific fields in users

There is no way to get documents of multiple types from Firestore with a single read operation. To get the user document referenced by userFK you will have to perform a separate read operation.
This is normal when using NoSQL databases like Cloud Firestore, as they typically don't support any server-side equivalent of a SQL JOIN statement. The performance of loading these additional details is not as bad as you may think though, so be sure to measure how long it takes for your use-case before writing it off as not feasible.
If this additional load is prohibitive for a scenario, an alternative is to duplicate the necessary data of the user into each details document. So instead of only storing the reference to their document, you'd for example also store the user name.
This puts more work on the write operation, but makes the read operations simpler and more scalable. This is the common trade-off of space vs time, where in NoSQL databases you'll often find yourself trading time for space: so storing duplicate data.
If you're new to NoSQL data modeling, I highly recommend:
NoSQL data modeling
Getting to know Cloud Firestore

Using Firestore document's auto-generated ID versus using a custom ID

I'm currently deciding on my Firestore data structure.
I'll need a products collection, and the products items will live inside of it as documents.
Here are my product's fields:
uniqueKey: string
description: array of strings
images: array of objects
price: number
QUESTION
Should I use Firestore auto-generated ID's to be the ID of my documents, or is it better to use my uniqueKey (which I'll query for in many occasions) as the document ID? Is there a best option between the 2?
I imagine that if I use my uniqueKey, it will make my life easier when retrieving a single document, but I'll have to query for more than 1 product on many occasions too.
Using my uniqueKey as ID:
db.collection("products").doc("myUniqueKey").get();
Using my Firestore auto-generated ID:
db.collection("products").where("uniqueKey", "==", "myUniqueKey").get();
Is this enough of a reason to go with my uniqueKey instead of the auto-generated one? Is there a rule of thumb here? What's the best practice in this case?

In terms of making queries from a client, using only the information you've given in the question, I don't see that there's much practical difference between a document get using its known ID, or a query on a field that is also unique. Either way, an index is used on the server side, and it costs exactly 1 document read. The document get() might be marginally faster, but it's not worthwhile to optimize like this (in my opinion).
When making decision about data modeling like this, it's more important to think about things like system behavior under load and security rules.
If you're reading and writing a lot of documents whose IDs have a sequential property, you could run into hotspotting on those writes. So, if you want to use your own ID, and you expect to be reading and writing them in that sequence under heavy load, you could have a problem. If you don't anticipate this to be the situation, then it likely doesn't matter too much whose ID you use.
If you are going to use security rules to limit access to documents, and you use the contents of other documents to help with that, you'll need to be able to uniquely identify those documents in your rule. You can't perform a query against a collection in rules, so you might need meaningful IDs that will give direct access when used by rules. If your own IDs can be used easily this way in security rules, that might be more convenient overall. If you're force to used Firestore's generated IDs, it might become inconvenient, difficult, or expensive to try to maintain a relationship between your IDs and Firestore's IDs.
In any event, the decision you're making is not just about which ID is "better" in a general sense, but which ID is better for your specific, anticipated situation, under load, with security in mind.

How to delete Single-field indexes that generated automatically by firestore?

update:
TLDR;
if you reached here, you should recheck the way you build your DB.
Your document(s) probably gets expended over time (due to nested list or etc.).
Original question:
I have a collection of documents that have a lot of fields. I do not query documents even no simple queries-
I am using only-
db.collection("mycollection").doc(docName).get().then(....);
in order to read the docs,
so I don't need any indexing for this collection.
The issue is that firestore generates Single-field indexes automatically, and due to the amount of fields cause limitation exceeding of indexing:
And if I trying to add a field to one of the documents it throws me an error:
Uncaught (in promise) Error: Too many indexed properties for entity: app: "s~myapp",path < Element { type: "tags", name: "aaaa" }>
at new FirestoreError (index.cjs.js:346)
at index.cjs.js:6058
at W.<anonymous> (index.cjs.js:6003)
at Ab (index.js:23)
at W.g.dispatchEvent (index.js:21)
at Re.Ca (index.js:98)
at ye.g.Oa (index.js:86)
at dd (index.js:42)
at ed (index.js:39)
at ad (index.js:37)
I couldn't find any way to delete these single-field-indexing or to tell firestore to stop generating them.
I found this in firestore console:
but there is no way to disable this, and to disable auto indexing for a specific collection.
Any way to do it?

You can delete simple Indexes in Firestore firestore.
See this answer for more up to date information on creating and deleting indexes.
Firestore composite index permutation explosion?
If you go in to Indexes after selecting the firestore database and then select "single" indexes there is an Add exemption button which allows you to specify which fields in a Collection (or Sub-collection) have simple indexes generated by Firestore. You have to specify the Collection followed by the field. You then specify every field individually as you cannot specify a whole collection. There does not seem to be any checking on valid Collections or field names.
The only way I can think to check this has worked is to do a query using the field and it should fail.
I do this on large string fields which have normal text in them as they would take a long time to index and I know I will never search using this field.
Firestore creates two indexes for every simple field (ascending and descending) but it is also possible to create an exemption which removes one of these if you will never need the second one which helps improve performance and makes it less likely to hit the index limits. In addition you can select whether arrays are indexed or not. If you create a lot of entries it an Array, then this can very quickly hit the firestore limits on the number of indexes, so care has to be taken when using indexes and it will often be best to take the indexes off Arrays since the designer may have no control over how many Array data items are added with the result that the maximum index limit is reached and the application will get an error as the original poster explained.
You can also remove any simple indexes if you are not using them even if a field is included in a complex index. The complex index will still work.
Other things to keep an eye on.
If you are indexing a timestamp field (or any field that increases or decreases sequentially between documents) and you are not using this to force a sequence in queries, then there is a maximum write rate of 500 writes per second for the collection. In this case, this limit can be removed by removing the increasing and decreasing indexes.
Note that unlike the Realtime Database, fields created with Auto-ID do not guarantee any ordering as they are generated by firestore to spread writes and avoid hotspots or bottlenecks where all writes (and therefore reads) end up at a single location. This means that a timestamp is often needed to generate ordering but you may be able to design your collections / sub-collections data layout to avoid the need for a timestamp. For example, if you are using a timestamp to find the last document added to a collection, it might be better to just store the ID of the last document added.
Large array or map fields can also cause the 20,000 index entries per document limit to be reached, so you can exempt the array from indexing (see screenshot below).
Once you have added one exemption, then you will get this screen.
See this link as well.
https://firebase.google.com/docs/firestore/query-data/index-overview

The short answer is you can't do that right now with Firebase. However, this is a good signal that you need to restructure your database models to avoid hitting limits such as the 1MB per document.
The documentation talks about the limitations on your data:
You can't run queries on nested lists. Additionally, this isn't as
scalable as other options, especially if your data expands over time.
With larger or growing lists, the document also grows, which can lead
to slower document retrieval times.
See this page for more information about the advantages and disadvantages on the different strategies for structuring your data: https://firebase.google.com/docs/firestore/manage-data/structure-data

As stated in the Firestore documentation:
Cloud Firestore requires an index for every query, to ensure the best performance. All document fields are automatically indexed, so queries that only use equality clauses don't need additional indexes. If you attempt a compound query with a range clause that doesn't map to an existing index, you receive an error. The error message includes a direct link to create the missing index in the Firebase console.
Can you update your question with the structure data you are trying to save?
A workaround for your problem would be to create compound indexes, or as a last resource, Firestore may not be suited to the needs for your app and Firebase Realtime Database can be a better solution.
See tradeoffs:
RTDB vs Firestore

I don't believe that there currently exists the switch that you are looking for, so I think that leaves the following,
Globally disable built-in indexes and create all indexes explicitly. Painful and they have limits too.
A workaround where you treat your Cloud Firestore unfriendly content like a BLOB, like so:
To store,
const objIn = {text: 'my object with a zillion fields' };
const jsonString = JSON.stringify(this.objIn);
const container = { content: this.jsonString };
To retrieve,
const objOut = JSON.parse(container.content);

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex