I have a collection of conversations, each conversation having a hidden<Map> where each participant is the key, having a boolean value, so I can see if he archived the conversation on his end or not. Therefore, the query looks like this:
store.conversations
.where( 'participants', 'array-contains', uid )
.where( `hidden.${uid}`, '==', false )
.orderBy( 'createdAt', 'desc' )
Problem rises when adding orderBy, which makes it a "range" query. So, given each document has a different set of keys in the hidden<Map>, Firestore is suggesting the following, which obviously wouldn't work:
participants Arrays
hidden.`48m6lKjwvKUOboAxlc0ppX2R7qF2` Ascending
createdAt Descending
How do I go around this? I guess flattening the Map would be a solution but, not most elegant. Any advice?
Firestore is suggesting the following, which obviously wouldn't work:
participants Arrays
hidden.`48m6lKjwvKUOboAxlc0ppX2R7qF2` Ascending
createdAt Descending
You can create such an index and it will work but the problem rises if your app becomes popular and you'll have you'll have a big number of users. This means that for every conversation you'll have to create an index and this is not such a good idea because when it comes to indexes, there are some limitations. According to the official documentation regarding Firestore usage and limits:
Maximum number of composite indexes for a database: 200
Number that can be reached very quickly.
I guess flattening the Map would be a solution
You're guessing right. This practice is also called denormalization and is a common practice when it comes to Firebase. If you are new to NoQSL databases, I recommend you see this video, Denormalization is normal with the Firebase Database for a better understanding. It is for Firebase realtime database but same rules apply to Cloud Firestore.
Also, when you are duplicating data, there is one thing that need to keep in mind. In the same way you are adding data, you need to maintain it. With other words, if you want to update/detele an item, you need to do it in every place that it exists.
For more information please also see my answer from the following post:
What is denormalization in Firebase Cloud Firestore?
So you can denormalize your database and create conversations without the need of creating indexes. For your use-case, you should consider augmenting your data structure to allow a reverse lookup by creating a new collection or subcollection named userConversations that can hold as documents all the conversations that a user has. For a simple query, there is no index needed.
Related
I am aware it would be very difficult to query by a value that does not exist in an array but is there a way to do this without doing exactly that?
Here is my scenario - I have a subscription based service where people can opt in and "follow" a specific artist. In my backend, this creates a subscription doc with their followerId, the id of the artist they want to follow (called artistId), and an array called pushed. The artist can add new releases, and then send each follower a notification of a specific song in the future. I would like to keep track of which follower has been pushed which release, and this done in the aforementioned pushed array. I need a way to find which followers have already been pushed a specific release and so...
I was thinking of combining two queries but I am not sure if it is possible. Something like:
db.collection('subscriptions').where('artistId', '==', artistId)
db.collection('subscriptions').where('artistId', '==', artistId).where('pushed', 'array-contains', releaseId)
And then take the intersection of both query results and subtract from the 1st query to get the followers that have not been pushed a specific release.
Is this possible? Or is there a better way?
There is no way to query Firestore for documents that don't have a certain field or value. It's not "very difficult", but simply not possible. To learn more on why that is, see:
Firestore get documents where value not in array?
Firestore: how to perform a query with inequality / not equals
The Get to know Cloud Firestore episode on how Firestore queries work.
Your workaround is possible, and technically not even very complex. The only thing to keep in mind is that you'll need to load all artists. So the performance will be linear to the number of artists you have. This may be fine for your app at the moment, but it's something to definitely do some measurements on.
Typically my workaround is to track not what releases were pushed to a user, but when the last push was sent to a user. Say that a release has a "notifications sent timestamp" and each user has a "last received notifications timestamp". By comparing the two you can query for users who haven't received a notification about a specific release yet.
The exact data model for this will depend on your exact use-case, and you might need to track multiple timestamps for each user (e.g. for each artist they follow). But I find that in general I can come up with a reasonable solution based on this model.
For Elasticsearch case you need to sync with your database and elasticsearch server. And also need to make firewall rules at your Google Cloud Platform, you need keep away the arbitrarily request to your server, since it may cause bandwith cost.
The not-in operator is now available in Firestore!
citiesRef.where('country', 'not-in', ['USA', 'Japan']);
See the docs for a full list of examples:
https://firebase.google.com/docs/firestore/query-data/queries#in_not-in_and_array-contains-any
citiesRef.where('country', 'not-in', [['USA']]);
Notice the double array around [['USA']]. You need this to filter out any docs that have 'USA' in the 'country' array.
Single array ['USA'] assumes that 'country' is a string.
I am aware it would be very difficult to query by a value that does not exist in an array but is there a way to do this without doing exactly that?
Here is my scenario - I have a subscription based service where people can opt in and "follow" a specific artist. In my backend, this creates a subscription doc with their followerId, the id of the artist they want to follow (called artistId), and an array called pushed. The artist can add new releases, and then send each follower a notification of a specific song in the future. I would like to keep track of which follower has been pushed which release, and this done in the aforementioned pushed array. I need a way to find which followers have already been pushed a specific release and so...
I was thinking of combining two queries but I am not sure if it is possible. Something like:
db.collection('subscriptions').where('artistId', '==', artistId)
db.collection('subscriptions').where('artistId', '==', artistId).where('pushed', 'array-contains', releaseId)
And then take the intersection of both query results and subtract from the 1st query to get the followers that have not been pushed a specific release.
Is this possible? Or is there a better way?
There is no way to query Firestore for documents that don't have a certain field or value. It's not "very difficult", but simply not possible. To learn more on why that is, see:
Firestore get documents where value not in array?
Firestore: how to perform a query with inequality / not equals
The Get to know Cloud Firestore episode on how Firestore queries work.
Your workaround is possible, and technically not even very complex. The only thing to keep in mind is that you'll need to load all artists. So the performance will be linear to the number of artists you have. This may be fine for your app at the moment, but it's something to definitely do some measurements on.
Typically my workaround is to track not what releases were pushed to a user, but when the last push was sent to a user. Say that a release has a "notifications sent timestamp" and each user has a "last received notifications timestamp". By comparing the two you can query for users who haven't received a notification about a specific release yet.
The exact data model for this will depend on your exact use-case, and you might need to track multiple timestamps for each user (e.g. for each artist they follow). But I find that in general I can come up with a reasonable solution based on this model.
For Elasticsearch case you need to sync with your database and elasticsearch server. And also need to make firewall rules at your Google Cloud Platform, you need keep away the arbitrarily request to your server, since it may cause bandwith cost.
The not-in operator is now available in Firestore!
citiesRef.where('country', 'not-in', ['USA', 'Japan']);
See the docs for a full list of examples:
https://firebase.google.com/docs/firestore/query-data/queries#in_not-in_and_array-contains-any
citiesRef.where('country', 'not-in', [['USA']]);
Notice the double array around [['USA']]. You need this to filter out any docs that have 'USA' in the 'country' array.
Single array ['USA'] assumes that 'country' is a string.
If you have decided to denormalize/duplicate your data in Firestore to optimize for reads, what patterns (if any) are generally used to keep track of the duplicated data so that they can be updated correctly to avoid inconsistent data?
As an example, if I have a feature like a Pinterest Board where any user on the platform can pin my post to their own board, how would you go about keeping track of the duplicated data in many locations?
What about creating a relational-like table for each unique location that the data can exist that is used to reconstruct the paths that require updating.
For example, creating a users_posts_boards collection that is firstly a collection of userIDs with a sub-collection of postIDs that finally has another sub-collection of boardIDs with a boardOwnerID. Then you use those to reconstruct the paths of the duplicated data for a post (eg. /users/[boardOwnerID]/boards/[boardID]/posts/[postID])?
Also if posts can additionally be shared to groups and lists would you continue to make users_posts_groups and users_posts_lists collections and sub-collections to track duplicated data in the same way?
Alternatively, would you instead have a posts_denormalization_tracker that is just a collection of unique postIDs that includes a sub-collection of locations that the post has been duplicated to?
{
postID: 'someID',
locations: ( <---- collection
"path/to/post/location1",
"path/to/post/location2",
...
)
}
This would mean that you would basically need to have all writes to Firestore done through Cloud Functions that can keep a track of this data for security reasons....unless Firestore security rules are sufficiently powerful to allow add operations to the /posts_denormalization_tracker/[postID]/locations sub-collection without allowing reads or updates to the sub-collection or the parent postIDs collection.
I'm basically looking for a sane way to track heavily denormalized data.
Edit: oh yeah, another great example would be the post author's profile information being embedded in every post. Imagine the hellscape trying to keep all that up-to-date as it is shared across a platform and then a user updates their profile.
I'm aswering this question because of your request from here.
When you are duplicating data, there is one thing that need to keep in mind. In the same way you are adding data, you need to maintain it. With other words, if you want to update/detele an object, you need to do it in every place that it exists.
What patterns (if any) are generally used to keep track of the duplicated data so that they can be updated correctly to avoid inconsistent data?
To keep track of all operations that we need to do in order to have consistent data, we add all operations to a batch. You can add one or more update operations on different references, as well as delete or add operations. For that please see:
How to do a bulk update in Firestore
What about creating a relational-like table for each unique location that the data can exist that is used to reconstruct the paths that require updating.
In my opinion there is no need to add an extra "relational-like table" but if you feel confortable with it, go ahead and use it.
Then you use those to reconstruct the paths of the duplicated data for a post (eg. /users/[boardOwnerID]/boards/[boardID]/posts/[postID])?
Yes, you need to pass to each document() method, the corresponding document id in order to make the update operation work. Unfortunately, there are no wildcards in Cloud Firestore paths to documents. You have to identify the documents by their ids.
Alternatively, would you instead have a posts_denormalization_tracker that is just a collection of unique postIDs that includes a sub-collection of locations that the post has been duplicated to?
I consider that isn't also necessary since it require extra read operations. Since everything in Firestore is about the number of read and writes, I think you should think again about this approach. Please see Firestore usage and limits.
unless Firestore security rules are sufficiently powerful to allow add operations to the /posts_denormalization_tracker/[postID]/locations sub-collection without allowing reads or updates to the sub-collection or the parent postIDs collection.
Firestore security rules are so powerful to do that. You can also allow to read or write or even apply security rules regarding each CRUD operation you need.
I'm basically looking for a sane way to track heavily denormalized data.
The simplest way I can think of, is to add the operation in a datastructure of type key and value. Let's assume we have a map that looks like this:
Map<Object, DocumentRefence> map = new HashMap<>();
map.put(customObject1, reference1);
map.put(customObject2, reference2);
map.put(customObject3, reference3);
//And so on
Iterate throught the map, and add all those keys and values to batch, commit the batch and that's it.
I have two Firestore collections, Users and Posts. Below are simplified examples of what the typical document in each contains.
*Note that the document IDs in the friends subcollection are equal to the document ID of the corresponding user documents. Optionally, I could also add a uid field to the friends documents and/or the Users documents. Also, there is a reason not relevant to this question that we have friends as a subcollection to each user, but if need-be we change it into a unified root-level Friends collection.
This setup makes it very easy to query for posts, sorted chronologically, by any given user by simply looking for Posts documents whose owner field is equal to the document reference of that user.
I achieve this in iOS/Swift with the following, though we are building this app for iOS, Android, and web.
guard let uid = Auth.auth().currentUser?.uid else {
print("No UID")
return
}
let firestoreUserRef = firestore.collection("Users").document(uid)
firestorePostsQuery = firestore.collection("Posts").whereField("owner", isEqualTo: firestoreUserRef).order(by: "timestamp", descending: true).limit(to: 25)
My question is how to query Posts documents that have owner values contained in the user's friends subcollection, sorted chronologically. In other words, how to get the posts belonging to the user's friends, sorted chronologically.
For a real-world example, consider Twitter, where a given user's feed is populated by all tweets that have an owner property whose value is contained in the user's following list, sorted chronologically.
Now, I know from the documentation that Firestore does not support logical OR queries, so I can't just chain all of the friends together. Even if I could, that doesn't really seem like an optimal approach for anyone with more than a small handful of friends.
The only option I can think of is to create a separate query for each friend. There are several problems with this, however. The first being the challenges presenting (in a smooth manner) the results from many asynchronous fetches. The second being that I can't merge the data into chronological order without re-sorting the set manually on the client every time one of the query snapshots is updated (i.e., real-time update).
Is it possible to build the query I am describing, or am I going to have to go this less-than optimal approach? This seems like a fairly common query use-case, so I'll be surprised if there is not a way to do this.
The sort chronologically is easy provided you are using a Unix timestamp, e.g. 1547608677790 using the .orderBy method. However, that leaves you with a potential mountain of queries to iterate through (one per friend).
So, I think you want to re-think the data store schema.
Take advantage of Cloud Functions for Firebase Triggers. When a new post is written, have a cloud function calculate who all should see it. Each user could have an array-type property containing all unread-posts, read-posts, etc.
Something like that would be fast and least taxing.
update:
TLDR;
if you reached here, you should recheck the way you build your DB.
Your document(s) probably gets expended over time (due to nested list or etc.).
Original question:
I have a collection of documents that have a lot of fields. I do not query documents even no simple queries-
I am using only-
db.collection("mycollection").doc(docName).get().then(....);
in order to read the docs,
so I don't need any indexing for this collection.
The issue is that firestore generates Single-field indexes automatically, and due to the amount of fields cause limitation exceeding of indexing:
And if I trying to add a field to one of the documents it throws me an error:
Uncaught (in promise) Error: Too many indexed properties for entity: app: "s~myapp",path < Element { type: "tags", name: "aaaa" }>
at new FirestoreError (index.cjs.js:346)
at index.cjs.js:6058
at W.<anonymous> (index.cjs.js:6003)
at Ab (index.js:23)
at W.g.dispatchEvent (index.js:21)
at Re.Ca (index.js:98)
at ye.g.Oa (index.js:86)
at dd (index.js:42)
at ed (index.js:39)
at ad (index.js:37)
I couldn't find any way to delete these single-field-indexing or to tell firestore to stop generating them.
I found this in firestore console:
but there is no way to disable this, and to disable auto indexing for a specific collection.
Any way to do it?
You can delete simple Indexes in Firestore firestore.
See this answer for more up to date information on creating and deleting indexes.
Firestore composite index permutation explosion?
If you go in to Indexes after selecting the firestore database and then select "single" indexes there is an Add exemption button which allows you to specify which fields in a Collection (or Sub-collection) have simple indexes generated by Firestore. You have to specify the Collection followed by the field. You then specify every field individually as you cannot specify a whole collection. There does not seem to be any checking on valid Collections or field names.
The only way I can think to check this has worked is to do a query using the field and it should fail.
I do this on large string fields which have normal text in them as they would take a long time to index and I know I will never search using this field.
Firestore creates two indexes for every simple field (ascending and descending) but it is also possible to create an exemption which removes one of these if you will never need the second one which helps improve performance and makes it less likely to hit the index limits. In addition you can select whether arrays are indexed or not. If you create a lot of entries it an Array, then this can very quickly hit the firestore limits on the number of indexes, so care has to be taken when using indexes and it will often be best to take the indexes off Arrays since the designer may have no control over how many Array data items are added with the result that the maximum index limit is reached and the application will get an error as the original poster explained.
You can also remove any simple indexes if you are not using them even if a field is included in a complex index. The complex index will still work.
Other things to keep an eye on.
If you are indexing a timestamp field (or any field that increases or decreases sequentially between documents) and you are not using this to force a sequence in queries, then there is a maximum write rate of 500 writes per second for the collection. In this case, this limit can be removed by removing the increasing and decreasing indexes.
Note that unlike the Realtime Database, fields created with Auto-ID do not guarantee any ordering as they are generated by firestore to spread writes and avoid hotspots or bottlenecks where all writes (and therefore reads) end up at a single location. This means that a timestamp is often needed to generate ordering but you may be able to design your collections / sub-collections data layout to avoid the need for a timestamp. For example, if you are using a timestamp to find the last document added to a collection, it might be better to just store the ID of the last document added.
Large array or map fields can also cause the 20,000 index entries per document limit to be reached, so you can exempt the array from indexing (see screenshot below).
Once you have added one exemption, then you will get this screen.
See this link as well.
https://firebase.google.com/docs/firestore/query-data/index-overview
The short answer is you can't do that right now with Firebase. However, this is a good signal that you need to restructure your database models to avoid hitting limits such as the 1MB per document.
The documentation talks about the limitations on your data:
You can't run queries on nested lists. Additionally, this isn't as
scalable as other options, especially if your data expands over time.
With larger or growing lists, the document also grows, which can lead
to slower document retrieval times.
See this page for more information about the advantages and disadvantages on the different strategies for structuring your data: https://firebase.google.com/docs/firestore/manage-data/structure-data
As stated in the Firestore documentation:
Cloud Firestore requires an index for every query, to ensure the best performance. All document fields are automatically indexed, so queries that only use equality clauses don't need additional indexes. If you attempt a compound query with a range clause that doesn't map to an existing index, you receive an error. The error message includes a direct link to create the missing index in the Firebase console.
Can you update your question with the structure data you are trying to save?
A workaround for your problem would be to create compound indexes, or as a last resource, Firestore may not be suited to the needs for your app and Firebase Realtime Database can be a better solution.
See tradeoffs:
RTDB vs Firestore
I don't believe that there currently exists the switch that you are looking for, so I think that leaves the following,
Globally disable built-in indexes and create all indexes explicitly. Painful and they have limits too.
A workaround where you treat your Cloud Firestore unfriendly content like a BLOB, like so:
To store,
const objIn = {text: 'my object with a zillion fields' };
const jsonString = JSON.stringify(this.objIn);
const container = { content: this.jsonString };
To retrieve,
const objOut = JSON.parse(container.content);