Structuring a firestore database to filter by what is not in the array? - firebase

I have am building collection that will contain over a million documents. Each document will contain one token and a history table. A process retrieves a token, it stores the process id in the history table inside the document so it can never be used again by the same process. The tokens are reusable by different processes. I want to make each process pull a document/token and never be able to pull that same document/token again.
My approach is to have a stored history table in each document with the processes that have used the token. That is why you need to query for what is not in the array.
Firestore does not have a condition where you can search for what is not in an array. How would I perform a query like such below where array-does-not-contain being a placeholder to search through an array where 'process-001' is not in the history array?
db.collection('tokens').where('history', 'array-does-not-contain',
'process-001').limit(1).get();
Below is how I'm planning to structure my collection,
My actual problem,
I have a multiple processes running and I only want each process to pull documents from firebase that it's never seen before. The firebase collection will be over a million documents and growing.

Firestore is not very well suited for queries that need to look for things that don't exist. The problem is that the indexes it uses are only meant to tell you if things exist. The universe of strings that don't exist would be impossible to efficiently quantify for indexing.
The only want to make this happen is to know the names of all the processes ahead of time, and create values for them in the index. You would do this with a map type object, not an array:
- token: "1234"
- history: {
"process-001": false,
"process-002": false,
"process-003": false
}
This document can be queried to find out if "history.process-001" has a value of false, then updated to true when the process uses it. But again, without all the process names known ahead of time and populated in each document, the query is not possible.
See also:
Firestore get documents where value not in array?
How to query Cloud Firestore for non-existing keys of documents

Related

How to structure data Firestore, for multiple user enteries

This is my first time using a NOSQL database and I'm really struggling to work out how to structure my data.
I have an app that predicts a users mood and then the user can select if that's right or not. So I need to save both the prediction and the actual result. I want to be able to pull the latest result from firebase and display it on the app.
I understand how I'd do this on an SQL DB and understand how to write an SQL query to get that data back out.
For my Firebase DB I thought of the following structure
the document name is the usersID and store multiple arrays based on the timestamp but I can't seem to user OrderBy on a document only a collection so not sure how to get this back.
The fact that this seems so difficult less me to believe I've implemented the DB wrong to begin with.
Structure of DB is as follows:
I should add that it all works fine for the USER_TABLE as its one document id and a single entry, so I've no problem retrieving that.
Thanks for your help!
orderBy is an instruction to the database to order documents on the server, before it returns them to your app. To store the fields inside the document, you can just do that inside your application code after it receives the document(s).
There is in itself nothing wrong with storing these entries in a single document, Just keep in mind that:
A document can be at most be 1MB in size, so make sure this fits your maximum number of entries.
Firestore only ever returns full documents, so you will either get all entries in a document, or none of them.
You won't be able to order or filter the entries inside a single document. If that is a requirement for you, consider storing each entry in its own document in a subcollection. Note that this will increase the number of documents each user reads though, which will increase the cost.

Resolve FK in firestore

I have some documents in firestore have some fields in it. like collection "details" looks like this
{
id: "",
fields1: "",
userFK: Reference to users collection
}
Now I need to resolve userFK on the fly means that I don't want first fetch all the documents then query to userFk userFK.get()
Is there any method, its like doing a $lookup whick is supported in mongodb
Even In some case I want to fetch documents from "details" collection based of some specific fields in users
There is no way to get documents of multiple types from Firestore with a single read operation. To get the user document referenced by userFK you will have to perform a separate read operation.
This is normal when using NoSQL databases like Cloud Firestore, as they typically don't support any server-side equivalent of a SQL JOIN statement. The performance of loading these additional details is not as bad as you may think though, so be sure to measure how long it takes for your use-case before writing it off as not feasible.
If this additional load is prohibitive for a scenario, an alternative is to duplicate the necessary data of the user into each details document. So instead of only storing the reference to their document, you'd for example also store the user name.
This puts more work on the write operation, but makes the read operations simpler and more scalable. This is the common trade-off of space vs time, where in NoSQL databases you'll often find yourself trading time for space: so storing duplicate data.
If you're new to NoSQL data modeling, I highly recommend:
NoSQL data modeling
Getting to know Cloud Firestore

Managing Denormalized/Duplicated Data in Cloud Firestore

If you have decided to denormalize/duplicate your data in Firestore to optimize for reads, what patterns (if any) are generally used to keep track of the duplicated data so that they can be updated correctly to avoid inconsistent data?
As an example, if I have a feature like a Pinterest Board where any user on the platform can pin my post to their own board, how would you go about keeping track of the duplicated data in many locations?
What about creating a relational-like table for each unique location that the data can exist that is used to reconstruct the paths that require updating.
For example, creating a users_posts_boards collection that is firstly a collection of userIDs with a sub-collection of postIDs that finally has another sub-collection of boardIDs with a boardOwnerID. Then you use those to reconstruct the paths of the duplicated data for a post (eg. /users/[boardOwnerID]/boards/[boardID]/posts/[postID])?
Also if posts can additionally be shared to groups and lists would you continue to make users_posts_groups and users_posts_lists collections and sub-collections to track duplicated data in the same way?
Alternatively, would you instead have a posts_denormalization_tracker that is just a collection of unique postIDs that includes a sub-collection of locations that the post has been duplicated to?
{
postID: 'someID',
locations: ( <---- collection
"path/to/post/location1",
"path/to/post/location2",
...
)
}
This would mean that you would basically need to have all writes to Firestore done through Cloud Functions that can keep a track of this data for security reasons....unless Firestore security rules are sufficiently powerful to allow add operations to the /posts_denormalization_tracker/[postID]/locations sub-collection without allowing reads or updates to the sub-collection or the parent postIDs collection.
I'm basically looking for a sane way to track heavily denormalized data.
Edit: oh yeah, another great example would be the post author's profile information being embedded in every post. Imagine the hellscape trying to keep all that up-to-date as it is shared across a platform and then a user updates their profile.
I'm aswering this question because of your request from here.
When you are duplicating data, there is one thing that need to keep in mind. In the same way you are adding data, you need to maintain it. With other words, if you want to update/detele an object, you need to do it in every place that it exists.
What patterns (if any) are generally used to keep track of the duplicated data so that they can be updated correctly to avoid inconsistent data?
To keep track of all operations that we need to do in order to have consistent data, we add all operations to a batch. You can add one or more update operations on different references, as well as delete or add operations. For that please see:
How to do a bulk update in Firestore
What about creating a relational-like table for each unique location that the data can exist that is used to reconstruct the paths that require updating.
In my opinion there is no need to add an extra "relational-like table" but if you feel confortable with it, go ahead and use it.
Then you use those to reconstruct the paths of the duplicated data for a post (eg. /users/[boardOwnerID]/boards/[boardID]/posts/[postID])?
Yes, you need to pass to each document() method, the corresponding document id in order to make the update operation work. Unfortunately, there are no wildcards in Cloud Firestore paths to documents. You have to identify the documents by their ids.
Alternatively, would you instead have a posts_denormalization_tracker that is just a collection of unique postIDs that includes a sub-collection of locations that the post has been duplicated to?
I consider that isn't also necessary since it require extra read operations. Since everything in Firestore is about the number of read and writes, I think you should think again about this approach. Please see Firestore usage and limits.
unless Firestore security rules are sufficiently powerful to allow add operations to the /posts_denormalization_tracker/[postID]/locations sub-collection without allowing reads or updates to the sub-collection or the parent postIDs collection.
Firestore security rules are so powerful to do that. You can also allow to read or write or even apply security rules regarding each CRUD operation you need.
I'm basically looking for a sane way to track heavily denormalized data.
The simplest way I can think of, is to add the operation in a datastructure of type key and value. Let's assume we have a map that looks like this:
Map<Object, DocumentRefence> map = new HashMap<>();
map.put(customObject1, reference1);
map.put(customObject2, reference2);
map.put(customObject3, reference3);
//And so on
Iterate throught the map, and add all those keys and values to batch, commit the batch and that's it.

Firestore checking if username exists, best model to not query the whole database?

Hello I'm developing and Android app and using Firebase's Firestore. My concern is about creating a username for my user when he is signing up for my app. I know I have to check if the username exists in my database, but what if you have 1 million users or 5. I don't think the results will be fast when you query the whole database. Is querying the whole database the only approach? or maybe creating a collection called usernames with 24 documents inside and for example the first document holds collection of usernames starting with a, then second document holds collection of usernames starting with b, and so on. Need your help. Thank you.
Actually one of the key characteristics of Firestore is exactly that: the performance of a query is proportional to the size of your result set, not your data set.
So the query performance that you get for finding 1 document in a collection of 5, 24 or 1 million docments will be exactly the same.
In Cloud Firestore, you can use queries to retrieve individual,
specific documents or to retrieve all the documents in a collection
that match your query parameters. Your queries can include multiple,
chained filters and combine filtering and sorting. They're also
indexed by default, so query performance is proportional to the size
of your result set, not your data set.
So the answer is that you should query your already existing collection of documents and not create smaller collection(s) with a subset of documents for the sake of query performance.

How to delete Single-field indexes that generated automatically by firestore?

update:
TLDR;
if you reached here, you should recheck the way you build your DB.
Your document(s) probably gets expended over time (due to nested list or etc.).
Original question:
I have a collection of documents that have a lot of fields. I do not query documents even no simple queries-
I am using only-
db.collection("mycollection").doc(docName).get().then(....);
in order to read the docs,
so I don't need any indexing for this collection.
The issue is that firestore generates Single-field indexes automatically, and due to the amount of fields cause limitation exceeding of indexing:
And if I trying to add a field to one of the documents it throws me an error:
Uncaught (in promise) Error: Too many indexed properties for entity: app: "s~myapp",path < Element { type: "tags", name: "aaaa" }>
at new FirestoreError (index.cjs.js:346)
at index.cjs.js:6058
at W.<anonymous> (index.cjs.js:6003)
at Ab (index.js:23)
at W.g.dispatchEvent (index.js:21)
at Re.Ca (index.js:98)
at ye.g.Oa (index.js:86)
at dd (index.js:42)
at ed (index.js:39)
at ad (index.js:37)
I couldn't find any way to delete these single-field-indexing or to tell firestore to stop generating them.
I found this in firestore console:
but there is no way to disable this, and to disable auto indexing for a specific collection.
Any way to do it?
You can delete simple Indexes in Firestore firestore.
See this answer for more up to date information on creating and deleting indexes.
Firestore composite index permutation explosion?
If you go in to Indexes after selecting the firestore database and then select "single" indexes there is an Add exemption button which allows you to specify which fields in a Collection (or Sub-collection) have simple indexes generated by Firestore. You have to specify the Collection followed by the field. You then specify every field individually as you cannot specify a whole collection. There does not seem to be any checking on valid Collections or field names.
The only way I can think to check this has worked is to do a query using the field and it should fail.
I do this on large string fields which have normal text in them as they would take a long time to index and I know I will never search using this field.
Firestore creates two indexes for every simple field (ascending and descending) but it is also possible to create an exemption which removes one of these if you will never need the second one which helps improve performance and makes it less likely to hit the index limits. In addition you can select whether arrays are indexed or not. If you create a lot of entries it an Array, then this can very quickly hit the firestore limits on the number of indexes, so care has to be taken when using indexes and it will often be best to take the indexes off Arrays since the designer may have no control over how many Array data items are added with the result that the maximum index limit is reached and the application will get an error as the original poster explained.
You can also remove any simple indexes if you are not using them even if a field is included in a complex index. The complex index will still work.
Other things to keep an eye on.
If you are indexing a timestamp field (or any field that increases or decreases sequentially between documents) and you are not using this to force a sequence in queries, then there is a maximum write rate of 500 writes per second for the collection. In this case, this limit can be removed by removing the increasing and decreasing indexes.
Note that unlike the Realtime Database, fields created with Auto-ID do not guarantee any ordering as they are generated by firestore to spread writes and avoid hotspots or bottlenecks where all writes (and therefore reads) end up at a single location. This means that a timestamp is often needed to generate ordering but you may be able to design your collections / sub-collections data layout to avoid the need for a timestamp. For example, if you are using a timestamp to find the last document added to a collection, it might be better to just store the ID of the last document added.
Large array or map fields can also cause the 20,000 index entries per document limit to be reached, so you can exempt the array from indexing (see screenshot below).
Once you have added one exemption, then you will get this screen.
See this link as well.
https://firebase.google.com/docs/firestore/query-data/index-overview
The short answer is you can't do that right now with Firebase. However, this is a good signal that you need to restructure your database models to avoid hitting limits such as the 1MB per document.
The documentation talks about the limitations on your data:
You can't run queries on nested lists. Additionally, this isn't as
scalable as other options, especially if your data expands over time.
With larger or growing lists, the document also grows, which can lead
to slower document retrieval times.
See this page for more information about the advantages and disadvantages on the different strategies for structuring your data: https://firebase.google.com/docs/firestore/manage-data/structure-data
As stated in the Firestore documentation:
Cloud Firestore requires an index for every query, to ensure the best performance. All document fields are automatically indexed, so queries that only use equality clauses don't need additional indexes. If you attempt a compound query with a range clause that doesn't map to an existing index, you receive an error. The error message includes a direct link to create the missing index in the Firebase console.
Can you update your question with the structure data you are trying to save?
A workaround for your problem would be to create compound indexes, or as a last resource, Firestore may not be suited to the needs for your app and Firebase Realtime Database can be a better solution.
See tradeoffs:
RTDB vs Firestore
I don't believe that there currently exists the switch that you are looking for, so I think that leaves the following,
Globally disable built-in indexes and create all indexes explicitly. Painful and they have limits too.
A workaround where you treat your Cloud Firestore unfriendly content like a BLOB, like so:
To store,
const objIn = {text: 'my object with a zillion fields' };
const jsonString = JSON.stringify(this.objIn);
const container = { content: this.jsonString };
To retrieve,
const objOut = JSON.parse(container.content);

Resources