Indexes on firestore collections with the same name at different "paths" - firebase

In my firestore database, I use the same collection name in different parts of my hierarchy. For example, imagine a stackoverflow-like site with the following 2 collections
/questions/{questionId}/votes/
/questions/{questionId}/answers/{answerId}/votes/
So now I want to create an index on one of these 2 collections. I would expect firestore to require some kind of "path-with-wildcards" like I've used above to identify the data to be indexed. However, instead, they only require the collection name: in this case, "votes".
So if I put an index on "votes" does it apply to both of these collections? Is there any way to put an index on one of these collection and not the other? Is it a best practice to use unique collection names to avoid this issue?

TL;DR:
Yes. Indexes are based on the collection id. This applies to both the ones we create automatically for you on single fields, as well as the composite indexes you create manually. If they are semantically different indexes we recommend you give them unique ids, so you could use question_votes and answer_votes.
More Info
Collection id is the identifier of the collection, excluding the full path. In your case, this is votes as you've noted.
The queries we currently serve use the subset of indexes for a specific path, although we have plans in the future to allow you to do a query that spans all collections with the same collection id (the collection group). This small bit of info adds some context to why.
A second reason is there is a 200 composite index limit in the system, so if someone had a data model structured like the following, /users/{user_id}/blog_posts/{post_id}, there would be no real way for them to create composite indexes on blog_posts for more than a handful of users (not to mention the operational burden of creating new indexes for every user!)

Related

Exclude list from another list of documents in Firestore - Swift

I have 2 collections in Firestore:
In the first I have the "alreadyLoaded" user ids,
In the second I have all userIDs,
How can I exclude the fist elements from the second elements making a query in Firestore?
the goal is to get only users that I haven't already loaded (optionally paginating the results).
Is there an easy way to achieve this using Firestore?
EDIT:
The number of documents I'm talking about will eventually become huge
This is not possible using a single query at scale. The only way to solve this situation as you've described is to fully query both collections, then write code in the client to remove the documents from the first set of results using the documents in the second set of results.
In fact, it's not possible to involve two collections in the same query at the same time. There are no joins. The only way to exclude documents from a query is to use a filter on the contents of the documents in the single collection.
Firestore might not be the best database for this kind of requirement if the collections are large and you're not able to precompute or cache the results.

Can I get all documents in a group of Firestore collections based on a list of ids?

I have a user collection. Each user has a list of collections with different names, ie "activities", "time", "distance", "sensor-a", "sensor-b", etc. The id of each document in these collections is common. So for example the all the documents with the same id are associated with each other.
Each document in the "activities" collection has properties that I would search on, while "time", "distance", and the "sensor-x" documents are for the most part a list of data points for various sensors. So for example the first element in each list correspond to each other, ie sensor-a[42] is the sensor reading at time[42], distance[42] and sensor-b[42].
So the data looks something like this:
user/100/activites/200
user/100/distance/200
user/100/time/200
user/100/sensor-a/200
user/100/sensor-b/200
user/100/activites/201
user/100/distance/201
user/100/time/201
user/100/sensor-a/201
user/100/sensor-b/201
I would like to be able to filter out a list of "activities" based on some property of the activity and then get all the documents from the other collections that have the same id as those I filtered.
I know I can create a list of ids I am interested in and then simply get each document one at a time but that does not seem very efficient.
I know that Firestore supports an in query method but that seems to be limited to a maximum of 10 entries, ie my filtered list of activities would need to be 10 or less.
I also see that Firestore has collection groups but they seem to require that all the collections have the same name. I think I could use this but I think I would need to restructure my data, probably adding a level of hierarchy, such as "activities", "distance", and "time" each have a data document that contains the current collections.
Is there any efficient way I can do this query with restructuring my data?
A single Firestore read operation can only read documents from a single collection, or from all collections with the same name. There is no way to otherwise read from multiple collections. The typically workaround is to perform a separate read for each collection, and then merge the results in your application code.
But here I would consider storing all the activity types for a single user in a single subcollection, and giving them a activityType field. One of the unique benefits of using Firestore is the performance of a query does not depend on the number of documents that it needs to consider, so there really is no penalty for combining the activities in a single subcollection.

Pagination with Filtering using Query Operation in DynamoDB Template

I would like to be able to filter a pagination result using query operation before the limit is taken into consideration.Is there any suggestion to get right pagination on filtered results?
I would like to implement a DynamoDB Scan OR Query with the following logic:
Scanning -> Filtering(boolean true or false) -> Limiting(for pagination)
However, I have only been able to implement a Scan OR Query with this logic:
Scanning -> Limiting(for pagination) -> Filtering(boolean true or false)
Note: I have already tried Global Secondary Index but it didn't work in my case Because I have 5 different attributes to filter and limit.
Unfortunatelly DynamoDB is not capable to do this, once you do Query on one of your indexes, it will read every single item that satisfies your partition and sort key.
Lets check your example - You have boolean and you have index over that field. Lets say 50% of items are false and 50% are true. Once you search by that index you will read through 50% of all items in table (so its almost like SCAN). If you set up limit, it will read only that number of items and then it stops. You cannot use the combination of limit and skip/page/offset like in other databases.
There is some level of pagination https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.Pagination.html but it does not allow you to jump to i.e. page 10, it only allows you go through all the pages one by one. Also I am not sure how it is priced, maybe internally the AWS will go through all the items before preparing the results for you, so you will pay for reading 50% of whole table even if you stop iterating before you reach the end.
There is also the limitation that index can have maximum of 2 fields (partition, sort).
EXAMPLE
You wrote that you have 5 parameters you want to query. The workaround that is used to address these limitations is to create and manage extra fields that have combination of parameters you want to query. Lets say you have table of users and you have there gender, age, name, surname and position. Lets say its huge database, so you have to think about amount of data you can load. Then if you want to use DynamoDB, you have to think about all queries you want to do.
You most likely want to search by name and surname, so you create index with surname as partition key and name as sort key (in such case you can search by surname or by both surname and name). It can work for lot of names, but you found out that some name combinations are too common and you need to filter by position as well. In such case, you create new field (column) called i.e. name-surname and whenever you create or update item, you will need to handle this field in your app to make sure it contains both of it, i.e. will-smith. Then you can make another index, that has name-surname as partition key and position as sort key. Now you can use it for such searches.
However you found out, that for some name-surname-position combination you get too many results and you dont want to handle it on application level and you want to limit results by age as well. Then you can create index with name-surname-position as partition key and age as sort key. At this moment you can also figure out that your old name-surname field and index can be removed as it server no purposes anymore (name and surname are handled by another index and for searching just name-surname-position you can use this index)
You want to query by gender as well sometimes? Its probably better to handle that in application level (or extra filter in db query) rather than creating new index that must be handled and payed for. There are only two types of gender (ok, lets say there exists more, but 99% of people will have just male or female) so its probably cheaper to just hide few fields on application level if someone wants to check only male/female/transgenders..., but load all of them. Because for extra index you would have to pay for every single insert, but this filter will be used only from time to time. Also when someone searches already by name, surname and position you dont expect that much results anyway, so if you get 20 (all genders) or just 10 (male only) results does not make much difference.
This ^^ was just example of how you can think and work with DynamoDB. How exactly you use it depends on your business logic.
Very important note: DynamoDB is very simple database that can only do very simple queries. It has little more functionality than Redis but a lot less functionality than traditional databases. The valid result of thinking about your business model/use-cases is that maybe you should NOT use the DynamoDB at all, because it can simply not satisfy your needs and queries.
Some basic thinking can look like this:
Is key-value persistant storage enough? Use DynamoDB
Is key-value persistant storage, where one item can have multiple keys and I can search and filter by maximum of 2 fields enough? Use DynamoDB
Is persistant storage, where I want to search single Table/Collection by many multiple keys with lot of options enough? Use MongoDB
Do I need to search through multiple tables or do complex joins or need transactions? Use traditional SQL database

How to generate and guarantee unique values in firestore collection?

Lets say we have a order collection in firestore where each order needs to have a unique readable random order number with lets say 8 digits:
{
orderNumber: '19456734'
}
So for every incoming order we want to generate this unique number. What is the recommended approach in firestore to make sure no other document is using it?
Note: One solution would be querying existing docs before saving, but this is not working in a concurrent scenario where multiple orders arrive at the same time (?).
The easiest to guarantee that some value is unique in a collection, is to use that value as the key/ID for the documents in that collection. Since keys/IDs are by definition unique in their collection, this implicitly enforces your requirement.
The only built-in way to generate unique IDs is by calling the add() method, which generates a UUID for the new document. If you don't want to use UUIDs to identify your orders, you'll have to roll your own mechanism.
The two most common approaches:
Generate a unique number and check if it's already taken. You'd do this in a transaction of course, to ensure no two instances can claim the same ID.
Keep a global counter (typically in a document at a well-known location) of the latest ID you've handed out, and then read-increment-write that in a transaction to get the ID for any new document. This is typically what other databases do for their built-in auto-ID fields.

How to delete Single-field indexes that generated automatically by firestore?

update:
TLDR;
if you reached here, you should recheck the way you build your DB.
Your document(s) probably gets expended over time (due to nested list or etc.).
Original question:
I have a collection of documents that have a lot of fields. I do not query documents even no simple queries-
I am using only-
db.collection("mycollection").doc(docName).get().then(....);
in order to read the docs,
so I don't need any indexing for this collection.
The issue is that firestore generates Single-field indexes automatically, and due to the amount of fields cause limitation exceeding of indexing:
And if I trying to add a field to one of the documents it throws me an error:
Uncaught (in promise) Error: Too many indexed properties for entity: app: "s~myapp",path < Element { type: "tags", name: "aaaa" }>
at new FirestoreError (index.cjs.js:346)
at index.cjs.js:6058
at W.<anonymous> (index.cjs.js:6003)
at Ab (index.js:23)
at W.g.dispatchEvent (index.js:21)
at Re.Ca (index.js:98)
at ye.g.Oa (index.js:86)
at dd (index.js:42)
at ed (index.js:39)
at ad (index.js:37)
I couldn't find any way to delete these single-field-indexing or to tell firestore to stop generating them.
I found this in firestore console:
but there is no way to disable this, and to disable auto indexing for a specific collection.
Any way to do it?
You can delete simple Indexes in Firestore firestore.
See this answer for more up to date information on creating and deleting indexes.
Firestore composite index permutation explosion?
If you go in to Indexes after selecting the firestore database and then select "single" indexes there is an Add exemption button which allows you to specify which fields in a Collection (or Sub-collection) have simple indexes generated by Firestore. You have to specify the Collection followed by the field. You then specify every field individually as you cannot specify a whole collection. There does not seem to be any checking on valid Collections or field names.
The only way I can think to check this has worked is to do a query using the field and it should fail.
I do this on large string fields which have normal text in them as they would take a long time to index and I know I will never search using this field.
Firestore creates two indexes for every simple field (ascending and descending) but it is also possible to create an exemption which removes one of these if you will never need the second one which helps improve performance and makes it less likely to hit the index limits. In addition you can select whether arrays are indexed or not. If you create a lot of entries it an Array, then this can very quickly hit the firestore limits on the number of indexes, so care has to be taken when using indexes and it will often be best to take the indexes off Arrays since the designer may have no control over how many Array data items are added with the result that the maximum index limit is reached and the application will get an error as the original poster explained.
You can also remove any simple indexes if you are not using them even if a field is included in a complex index. The complex index will still work.
Other things to keep an eye on.
If you are indexing a timestamp field (or any field that increases or decreases sequentially between documents) and you are not using this to force a sequence in queries, then there is a maximum write rate of 500 writes per second for the collection. In this case, this limit can be removed by removing the increasing and decreasing indexes.
Note that unlike the Realtime Database, fields created with Auto-ID do not guarantee any ordering as they are generated by firestore to spread writes and avoid hotspots or bottlenecks where all writes (and therefore reads) end up at a single location. This means that a timestamp is often needed to generate ordering but you may be able to design your collections / sub-collections data layout to avoid the need for a timestamp. For example, if you are using a timestamp to find the last document added to a collection, it might be better to just store the ID of the last document added.
Large array or map fields can also cause the 20,000 index entries per document limit to be reached, so you can exempt the array from indexing (see screenshot below).
Once you have added one exemption, then you will get this screen.
See this link as well.
https://firebase.google.com/docs/firestore/query-data/index-overview
The short answer is you can't do that right now with Firebase. However, this is a good signal that you need to restructure your database models to avoid hitting limits such as the 1MB per document.
The documentation talks about the limitations on your data:
You can't run queries on nested lists. Additionally, this isn't as
scalable as other options, especially if your data expands over time.
With larger or growing lists, the document also grows, which can lead
to slower document retrieval times.
See this page for more information about the advantages and disadvantages on the different strategies for structuring your data: https://firebase.google.com/docs/firestore/manage-data/structure-data
As stated in the Firestore documentation:
Cloud Firestore requires an index for every query, to ensure the best performance. All document fields are automatically indexed, so queries that only use equality clauses don't need additional indexes. If you attempt a compound query with a range clause that doesn't map to an existing index, you receive an error. The error message includes a direct link to create the missing index in the Firebase console.
Can you update your question with the structure data you are trying to save?
A workaround for your problem would be to create compound indexes, or as a last resource, Firestore may not be suited to the needs for your app and Firebase Realtime Database can be a better solution.
See tradeoffs:
RTDB vs Firestore
I don't believe that there currently exists the switch that you are looking for, so I think that leaves the following,
Globally disable built-in indexes and create all indexes explicitly. Painful and they have limits too.
A workaround where you treat your Cloud Firestore unfriendly content like a BLOB, like so:
To store,
const objIn = {text: 'my object with a zillion fields' };
const jsonString = JSON.stringify(this.objIn);
const container = { content: this.jsonString };
To retrieve,
const objOut = JSON.parse(container.content);

Resources