I want to be sure that no, or very little, Firestore storage is used for indexing an array containing many maps. To my understanding when reading about Firestore index types, no index are created for array of maps in a document since that can not be queried. Am I right think this?
For example, here is an image of the array of maps:
There will be a lot of map elements in those progressionArray arrays but not enough to exceed 1MB per document. Since all progression data always needs to be loaded by the user, it seems best to me to store this data in an array to minimize Firestore reading costs (and index storage costs). Also there is no need to index this data since it will always all be loaded once by the user.
What are the indexing storage costs associated to this progressionArray? Are they zero like I think since it can not be queried?
Thank you!
The documentation says
“A single-field index stores a sorted mapping of all the documents in a collection that contain a specific field.” so indexes will be created for arrays of maps.
You can create an exemption for single field indexes as explained here.
The only cost the indexes have is the amount of storage it takes to save them. You can calculate the index cost with the values specified in this document.
Related
Is there a way to get all documents which array field does not contain one or more values, now there is "array-contains" but is there something like "array-not-contains"?
You can only query Firestore based on indexes, so that queries all scale up to search billions of documents without performance problems.
Indexes work by recording values that exist in your data set. An index can't possibly be efficient if it tracks things that don't exist. This is because the universe of non-existant values compared to your data set is extremely large and can't be indexed as such. Querying for non-existence of some value would require a scan of all your documents, and that doesn't scale.
I don't think that is possible at the moment. I would try looking at this blog post for reference.
better arrays in cloud firestore
You might need to convert your array to an object so that you can query by (property === false)
Considering that your collection will have a low number of documents, you could store all of their ids in another document using an onCreate cloud function trigger, download this document from the client and do the filtering client-side. You could also do all of this inside a cloud function if you're worried about performance.
You'll have 1 extra read but that's no big deal, each document can have up to 1 MB of storage and that's a lot so you shouldn't be worried about it too much, you could also divide those ids into different documents and merge them on the client/cloud function if they get too big.
This works very well for small sets of data, but if you're expecting millions of documents, then there isn't much you can do.
Firestore recently added support for a not-in clause.
citiesRef.where('country', 'not-in', ['USA', 'Japan']);
That will get every doc where country exists and has a value other than 'USA', 'Japan', or 'null'.
Firestore charges for the amount of indexes used. If I have a structure where there is a massive list of ratings different users gave, and have the key as the user Id and the value as the rating, will that take up too many auto created indexes? Is there a good structure around this.
For example, in the collection 'ratings', I shard individual ratings that each user gives into different documents using a complex sharding mechanism I made that fills a document up to the max document size of around 20k, then starts filling up another document. say I have 5 documents, each filled with 20k fields. One of those docs would look like this:
uid1: 3.3
uid2: 5
uid3: 1.234
...
Is there another structure I should be using to store loads of individual 'fields' in Firestore? I don't want to use loads of documents for each rating either as that is too expensive. Arrays aren't big enough to store loads of ratings either.
Arrays aren't big enough to store loads of ratings either
The problem isn't about the arrays, the problem is that the documents have limits. So there are some limits when it comes to how much data you can put into a document. According to the official documentation regarding usage and limits:
Maximum size for a document: 1 MiB (1,048,576 bytes)
As you can see, you are limited to 1 MiB total of data in a single document. When we are talking about storing text, you can store pretty much but as your array getts bigger, be careful about this limitation.
According to the offical documentation regarding modelling data in Cloud Firestore:
Cloud Firestore is optimized for storing large collections of small documents.
So trying to shard a collection by filling up documents one by one, is not such a good idea.
If you are trying to add raitings from multipe users in a single document, with other words you trying to store large amount of data in a single document that can be updated by lots of users, there is another limitation that you need to take care of. So you are limited to 1 write per second on every document. So if you have a situation in which a lot of users al all trying to write data to the same documents all at once, you might start to see some of this writes to fail. So, be careful about this limitation too.
My recommendation is to store those raitings in an array, if you think that the size of the document will be within the 1MiB limitation, otherwise use a collection of tags for each object separately.
Is there a way to get all documents which array field does not contain one or more values, now there is "array-contains" but is there something like "array-not-contains"?
You can only query Firestore based on indexes, so that queries all scale up to search billions of documents without performance problems.
Indexes work by recording values that exist in your data set. An index can't possibly be efficient if it tracks things that don't exist. This is because the universe of non-existant values compared to your data set is extremely large and can't be indexed as such. Querying for non-existence of some value would require a scan of all your documents, and that doesn't scale.
I don't think that is possible at the moment. I would try looking at this blog post for reference.
better arrays in cloud firestore
You might need to convert your array to an object so that you can query by (property === false)
Considering that your collection will have a low number of documents, you could store all of their ids in another document using an onCreate cloud function trigger, download this document from the client and do the filtering client-side. You could also do all of this inside a cloud function if you're worried about performance.
You'll have 1 extra read but that's no big deal, each document can have up to 1 MB of storage and that's a lot so you shouldn't be worried about it too much, you could also divide those ids into different documents and merge them on the client/cloud function if they get too big.
This works very well for small sets of data, but if you're expecting millions of documents, then there isn't much you can do.
Firestore recently added support for a not-in clause.
citiesRef.where('country', 'not-in', ['USA', 'Japan']);
That will get every doc where country exists and has a value other than 'USA', 'Japan', or 'null'.
update:
TLDR;
if you reached here, you should recheck the way you build your DB.
Your document(s) probably gets expended over time (due to nested list or etc.).
Original question:
I have a collection of documents that have a lot of fields. I do not query documents even no simple queries-
I am using only-
db.collection("mycollection").doc(docName).get().then(....);
in order to read the docs,
so I don't need any indexing for this collection.
The issue is that firestore generates Single-field indexes automatically, and due to the amount of fields cause limitation exceeding of indexing:
And if I trying to add a field to one of the documents it throws me an error:
Uncaught (in promise) Error: Too many indexed properties for entity: app: "s~myapp",path < Element { type: "tags", name: "aaaa" }>
at new FirestoreError (index.cjs.js:346)
at index.cjs.js:6058
at W.<anonymous> (index.cjs.js:6003)
at Ab (index.js:23)
at W.g.dispatchEvent (index.js:21)
at Re.Ca (index.js:98)
at ye.g.Oa (index.js:86)
at dd (index.js:42)
at ed (index.js:39)
at ad (index.js:37)
I couldn't find any way to delete these single-field-indexing or to tell firestore to stop generating them.
I found this in firestore console:
but there is no way to disable this, and to disable auto indexing for a specific collection.
Any way to do it?
You can delete simple Indexes in Firestore firestore.
See this answer for more up to date information on creating and deleting indexes.
Firestore composite index permutation explosion?
If you go in to Indexes after selecting the firestore database and then select "single" indexes there is an Add exemption button which allows you to specify which fields in a Collection (or Sub-collection) have simple indexes generated by Firestore. You have to specify the Collection followed by the field. You then specify every field individually as you cannot specify a whole collection. There does not seem to be any checking on valid Collections or field names.
The only way I can think to check this has worked is to do a query using the field and it should fail.
I do this on large string fields which have normal text in them as they would take a long time to index and I know I will never search using this field.
Firestore creates two indexes for every simple field (ascending and descending) but it is also possible to create an exemption which removes one of these if you will never need the second one which helps improve performance and makes it less likely to hit the index limits. In addition you can select whether arrays are indexed or not. If you create a lot of entries it an Array, then this can very quickly hit the firestore limits on the number of indexes, so care has to be taken when using indexes and it will often be best to take the indexes off Arrays since the designer may have no control over how many Array data items are added with the result that the maximum index limit is reached and the application will get an error as the original poster explained.
You can also remove any simple indexes if you are not using them even if a field is included in a complex index. The complex index will still work.
Other things to keep an eye on.
If you are indexing a timestamp field (or any field that increases or decreases sequentially between documents) and you are not using this to force a sequence in queries, then there is a maximum write rate of 500 writes per second for the collection. In this case, this limit can be removed by removing the increasing and decreasing indexes.
Note that unlike the Realtime Database, fields created with Auto-ID do not guarantee any ordering as they are generated by firestore to spread writes and avoid hotspots or bottlenecks where all writes (and therefore reads) end up at a single location. This means that a timestamp is often needed to generate ordering but you may be able to design your collections / sub-collections data layout to avoid the need for a timestamp. For example, if you are using a timestamp to find the last document added to a collection, it might be better to just store the ID of the last document added.
Large array or map fields can also cause the 20,000 index entries per document limit to be reached, so you can exempt the array from indexing (see screenshot below).
Once you have added one exemption, then you will get this screen.
See this link as well.
https://firebase.google.com/docs/firestore/query-data/index-overview
The short answer is you can't do that right now with Firebase. However, this is a good signal that you need to restructure your database models to avoid hitting limits such as the 1MB per document.
The documentation talks about the limitations on your data:
You can't run queries on nested lists. Additionally, this isn't as
scalable as other options, especially if your data expands over time.
With larger or growing lists, the document also grows, which can lead
to slower document retrieval times.
See this page for more information about the advantages and disadvantages on the different strategies for structuring your data: https://firebase.google.com/docs/firestore/manage-data/structure-data
As stated in the Firestore documentation:
Cloud Firestore requires an index for every query, to ensure the best performance. All document fields are automatically indexed, so queries that only use equality clauses don't need additional indexes. If you attempt a compound query with a range clause that doesn't map to an existing index, you receive an error. The error message includes a direct link to create the missing index in the Firebase console.
Can you update your question with the structure data you are trying to save?
A workaround for your problem would be to create compound indexes, or as a last resource, Firestore may not be suited to the needs for your app and Firebase Realtime Database can be a better solution.
See tradeoffs:
RTDB vs Firestore
I don't believe that there currently exists the switch that you are looking for, so I think that leaves the following,
Globally disable built-in indexes and create all indexes explicitly. Painful and they have limits too.
A workaround where you treat your Cloud Firestore unfriendly content like a BLOB, like so:
To store,
const objIn = {text: 'my object with a zillion fields' };
const jsonString = JSON.stringify(this.objIn);
const container = { content: this.jsonString };
To retrieve,
const objOut = JSON.parse(container.content);
So in the limits section (https://firebase.google.com/docs/firestore/quotas) of the new Firestore product from Firebase it says:
Maximum write rate to a collection in which documents contain
sequential values in an indexed field: 500 per second
We're pretty confused as to what that actually entails.
If we have, say, a root-level collection called users with 10 million entries in it, will this rate affect this collection in such a way, so only 500 users can update their data in any given second?
Can anyone clarify?
Sorry for the confusion; an example might help.
If your user documents contained a last-updated timestamp and you index on that timestamp then each new write would end up clustering around the same value (now) creating a hotspot in the index.
Similarly if you somehow assigned users a sequential value like a place in line or something like that this would also create a hotspot.
Incidentally this is why generated document IDs are random strings. This evenly distributes the writes on the primary key index.
If you avoid these kinds of patterns the sky's the limit, though during beta you'd hit the database-wide limit.
A quick additional note : for the moment all properties are indexed by default, so if you had a last-updated timestamp it would necessarily be indexed - so you would not be able to avoid the hotspoting.
Index disablement will be available down the road though.