I need to create a firestore doc which also has a collection, ideally in a single write operation.
I'm not seeing anything like this in the documentation, so failing that any tips on getting the created doc id and then adding multiple documents to a collection?
Edit: I'm developing in typescript/js
You can use a transaction or batch write to perform atomic operations among multiple documents. In both cases, you need to know all the IDs for all the documents you want to create ahead of time. You don't need to create a document in order for there to be a subcollection. Documents don't actually "contain" subcollection. Subcollections are currently just a technique for organizing your data.
It wasn't really clear from your question, but if your first document requires a generated ID, you can use the example code in the documentation that generates a DocumentReference object, which you can populate later in your transaction or batch.
Since you didn't say which language or system you're working on, I don't know which code sample to show here from the docs, so you'll have to go to the documentation linked here to see how it works. You'll end up using a method called "doc" or "document" on a CollectionReference to generate the DocumentReference with the generated ID.
Related
What I understand after reading the Firestore official batch operations doc is that, Batch operations perform multiple operations and can change multiple documents simultaneously.
However, when I read the sample code in the official firestore doc for updates it says:
var sfRef = db.collection("cities").doc("SF");
batch.update(sfRef, {"population": 1000000});
Now, I don't understand why they gave the name of a specific document when trying to update the batch. I thought the whole point of batch operations was to update multiple documents in a collection, so why are we giving the name of the document and limiting the operation only to a single document.
Thanks in advance :)
Now, I don't understand why they gave the name of a specific document when trying to update the batch.
It's required to call out each individual document to create, update, or remove in the batch. There are no alternatives to this. Firesotre doesn't offer any way to bulk update multiple items as a result of a query, similar to SQL "update where" queries. If you have multiple items to update in a batch, you would instead have to:
Perform the query to find each document
Iterate the results and collect references to each document
Add each document update to the batch using the batch API.
I'm trying to clean up my Google Cloud Firestore database, and I have some subcollections with no parent doc (the parent was deleted). How can I find all of those, using the Firebase Admin SDK, so I can delete them?
You will end up writing a lot of code for this. I'm going to link to nodejs APIs.
For each collection where there could be missing documents, you will need to need to query that collection with listDocuments(). That will return a list of all documents in the collection, including the missing documents that have subcollections. You will then need to iterate the DocumentReferences returned in that list, and call get() on every one of them. The returned DocumentSnapshot will then tell you if it exists or not using its exists property.
After you have all the DocumentReference objects referring to missing documents, you can then follow the instructions in this other question that describes how to delete all nested subcollections under that DocumentReference, go straight to the Firebase documentation.
I have am building collection that will contain over a million documents. Each document will contain one token and a history table. A process retrieves a token, it stores the process id in the history table inside the document so it can never be used again by the same process. The tokens are reusable by different processes. I want to make each process pull a document/token and never be able to pull that same document/token again.
My approach is to have a stored history table in each document with the processes that have used the token. That is why you need to query for what is not in the array.
Firestore does not have a condition where you can search for what is not in an array. How would I perform a query like such below where array-does-not-contain being a placeholder to search through an array where 'process-001' is not in the history array?
db.collection('tokens').where('history', 'array-does-not-contain',
'process-001').limit(1).get();
Below is how I'm planning to structure my collection,
My actual problem,
I have a multiple processes running and I only want each process to pull documents from firebase that it's never seen before. The firebase collection will be over a million documents and growing.
Firestore is not very well suited for queries that need to look for things that don't exist. The problem is that the indexes it uses are only meant to tell you if things exist. The universe of strings that don't exist would be impossible to efficiently quantify for indexing.
The only want to make this happen is to know the names of all the processes ahead of time, and create values for them in the index. You would do this with a map type object, not an array:
- token: "1234"
- history: {
"process-001": false,
"process-002": false,
"process-003": false
}
This document can be queried to find out if "history.process-001" has a value of false, then updated to true when the process uses it. But again, without all the process names known ahead of time and populated in each document, the query is not possible.
See also:
Firestore get documents where value not in array?
How to query Cloud Firestore for non-existing keys of documents
If you have decided to denormalize/duplicate your data in Firestore to optimize for reads, what patterns (if any) are generally used to keep track of the duplicated data so that they can be updated correctly to avoid inconsistent data?
As an example, if I have a feature like a Pinterest Board where any user on the platform can pin my post to their own board, how would you go about keeping track of the duplicated data in many locations?
What about creating a relational-like table for each unique location that the data can exist that is used to reconstruct the paths that require updating.
For example, creating a users_posts_boards collection that is firstly a collection of userIDs with a sub-collection of postIDs that finally has another sub-collection of boardIDs with a boardOwnerID. Then you use those to reconstruct the paths of the duplicated data for a post (eg. /users/[boardOwnerID]/boards/[boardID]/posts/[postID])?
Also if posts can additionally be shared to groups and lists would you continue to make users_posts_groups and users_posts_lists collections and sub-collections to track duplicated data in the same way?
Alternatively, would you instead have a posts_denormalization_tracker that is just a collection of unique postIDs that includes a sub-collection of locations that the post has been duplicated to?
{
postID: 'someID',
locations: ( <---- collection
"path/to/post/location1",
"path/to/post/location2",
...
)
}
This would mean that you would basically need to have all writes to Firestore done through Cloud Functions that can keep a track of this data for security reasons....unless Firestore security rules are sufficiently powerful to allow add operations to the /posts_denormalization_tracker/[postID]/locations sub-collection without allowing reads or updates to the sub-collection or the parent postIDs collection.
I'm basically looking for a sane way to track heavily denormalized data.
Edit: oh yeah, another great example would be the post author's profile information being embedded in every post. Imagine the hellscape trying to keep all that up-to-date as it is shared across a platform and then a user updates their profile.
I'm aswering this question because of your request from here.
When you are duplicating data, there is one thing that need to keep in mind. In the same way you are adding data, you need to maintain it. With other words, if you want to update/detele an object, you need to do it in every place that it exists.
What patterns (if any) are generally used to keep track of the duplicated data so that they can be updated correctly to avoid inconsistent data?
To keep track of all operations that we need to do in order to have consistent data, we add all operations to a batch. You can add one or more update operations on different references, as well as delete or add operations. For that please see:
How to do a bulk update in Firestore
What about creating a relational-like table for each unique location that the data can exist that is used to reconstruct the paths that require updating.
In my opinion there is no need to add an extra "relational-like table" but if you feel confortable with it, go ahead and use it.
Then you use those to reconstruct the paths of the duplicated data for a post (eg. /users/[boardOwnerID]/boards/[boardID]/posts/[postID])?
Yes, you need to pass to each document() method, the corresponding document id in order to make the update operation work. Unfortunately, there are no wildcards in Cloud Firestore paths to documents. You have to identify the documents by their ids.
Alternatively, would you instead have a posts_denormalization_tracker that is just a collection of unique postIDs that includes a sub-collection of locations that the post has been duplicated to?
I consider that isn't also necessary since it require extra read operations. Since everything in Firestore is about the number of read and writes, I think you should think again about this approach. Please see Firestore usage and limits.
unless Firestore security rules are sufficiently powerful to allow add operations to the /posts_denormalization_tracker/[postID]/locations sub-collection without allowing reads or updates to the sub-collection or the parent postIDs collection.
Firestore security rules are so powerful to do that. You can also allow to read or write or even apply security rules regarding each CRUD operation you need.
I'm basically looking for a sane way to track heavily denormalized data.
The simplest way I can think of, is to add the operation in a datastructure of type key and value. Let's assume we have a map that looks like this:
Map<Object, DocumentRefence> map = new HashMap<>();
map.put(customObject1, reference1);
map.put(customObject2, reference2);
map.put(customObject3, reference3);
//And so on
Iterate throught the map, and add all those keys and values to batch, commit the batch and that's it.
Is it possible to create multiple documents against a collection using Cloud Firestore in the same transaction?
I'm looking at the documentation on batched writes. Unless I'm mistaken (I'm new to Firebase, so could be the case) these examples are meant to demonstrate 'batched writes' but the examples only show a single field being updated.
Yes. Batched writes can work for both updating a single field on multiple documents, or creating a bunch of documents at once.
Follow this link to see a similar case i had which i solved. It talked about a collection with its documents being automatically created when a document is created. You will get an idea on how to solve your issue from it.
The link is thus: When collection is created, the documents are added to it at once
I hope it helps.