Fetch all keys from Firestore collection with admin sdk - firebase

I need to fetch all the ids/keys of a collection in Cloud Firestore. Currently I do it like this (groovy):
ApiFuture<QuerySnapshot> snapshot = firestoreClient.database.collection(bucket).get()
List<QueryDocumentSnapshot> documents = snapshot.get().getDocuments()
for (QueryDocumentSnapshot document : documents) {
keys.add(document.id)
}
I run this on a collection which potentially has could have a lot of documents lets say 30.000 which causes a java.lang.OutOfMemoryError: Java heap space
The thing is that I don't need all the documents. As seen in my code all I need is to check which documents are in the collection (ie. a list of keys/id's), but I have not found any way to grab them with out fetching all the documents which has a huge overhead.
I using the Java Firebase Admin SDK (6.12.2).
So I'm hoping that there is a way to grab all the keys with out the overhead and without my heap maxing out.

Calling get() will get you the full documents. But you should be able to do an empty selection. From the documentation for select():
public Query select(String... fields)
Creates and returns a new Query instance that applies a field mask to the result and returns the specified subset of fields. You can specify a list of field paths to return, or use an empty list to only return the references of matching documents.
So something like this:
firestoreClient.database.collection(bucket).select(new String[0])
Also see:
How to get a list of document IDs in a collection Cloud Firestore?
the Firestore Java reference documentation for the select function

Related

How to create one stream listening to multiple Firestore documents created from list of documents references in Flutter

Im trying to create one stream, that is using multiple documents references that are stored and fetched from Firebase Firestore.
Lets say I have two collection named users and documents. When user is created he gets document with his id in users collection with field named documentsHasAccessTo that is list of references to documents inside documents collection. It is important, that these documents can be located in different sub collections inside documents collection so I dont want to query whole documents and filter it, in order to save Firestore transfer and make it faster I already know paths to documents stored in documentsHasAccessTo field.
So for example, I can have user with data inside users/<user uid> document with documentsHasAccessTo field that stores 3 different document references.
I would like to achieve something like this (untested):
final userId = 'blablakfn1n21n4109';
final usersDocumentRef = FirebaseFirestore.instance.doc('users/$userId');
usersDocumentRef.snapshots().listen((snapshot) {
final references = snapshot.data()['documentsHasAccessTo'] as List<DocumentReference>;
final documentsStream = // create single query stream using all references from list
});
Keep in mind, that it would also be great, if this stream would update query if documentsHasAccessTo changes like in the example above, hence I used snapshots() on usersDocumentReferences rather than single get() fetch.
The more I think about this Im starting to believe this is simple impossible or theres a more simple and clean solution. Im open to anything.
You could use rxdart's switchMap and MergeStream:
usersDocumentRef.snapshots().switchMap((snapshot) {
final references = snapshot.data()['documentsHasAccessTo'] as List<DocumentReference>;
return MergeStream(references.map(ref) => /* do something that creates a stream */));
});

How do I get list of the document ID's inside a collection without getting the content of the documents in Firestore?

I want to do something like:
final collectionReference = Firestore.instance.collection('myCollection');
final List<String> documentList = collectionReference.getDocList();
OR
final query = collectionReference.orderBy('lastUpdated', descending: true).limit(100);
final List<String> documentList = query.getDocList();
Currently when we use get() or query.getDocuments() it will return the whole Document list and all contents along with it. But you know that we wanna optimize reads thus we make use of the 1MiB limit of each Documents. Thus we don't wanna download the whole document's contents. Rather we just need the IDs for other usage. Is there a way to do this ?
Thanks
There no way to do this with the Flutter APIs (or any of the web or mobile clients).
The only way is with backend code, where you have a method like select() (link to the nodejs API) on Query that lets you select which document fields to return. So, you could have your app call a backend to return the document IDs, but not directly in the app.
If you must query from the client, consider moving fields unnecessary for queries to documents in anther collection with the same IDs, and request them only when needed.

Check if a document exists on Firestore without get() the full document data

So this is possible:
const docSnapshot = await firebase.firestore().collection("SOME_COL").doc("SOME_DOC").get();
console.log(docSnapshot.exists);
But it "downloads" the whole document just to check if it exists. And I'm currently working with some havier documents and I have a script where I just need to know if they exist, but I don't need to download them at that time.
Is there a way to check if a document exist without .get() and avoid downloading the document data?
It seems you are using the JavaScript SDK. With this SDK there isn't any way to only get a subset of the fields of a document.
One of the possible solutions is to maintain another collection with documents that have the same IDs than the main collection documents but which only hold a very small dummy field. You could use a set of Cloud Functions to synchronise the two collections (Documents creation/deletion).
On the other hand, with the Firestore REST API, it is possible, with the get method, to define a DocumentMask which defines a "set of field paths on a document" and is "used to restrict a get operation on a document to a subset of its fields". Depending on your exact use case, this can be an interesting and easier solution.

List all Firestore collections using Python admin SDK

I want to get all the collections stored in my Firestore database. I went through the documentation and found that getCollections() method on DocumentReference can be used if you're using Node.js server SDK. What's the Python equivalent of this?
The Python SDK has a Client.collections method that lists all top-level collections.
Once you have a document, you can get the subcollections of that by calling the DocumentReference.collections method.
I used this python snippet to go into top-level collection "my-collection" and find a document with a certain "id". Within that document i want to retrieve the documents of the last subcollection, since they contain the last version of the data that I am searching for.
fs = firestore_v1.Client()
collection = fs.collection("my-collection")
doc = collection.document("id")
sub_collections = doc.collections()
*_, last = sub_collections
sub_docs = last.stream()
for doc in sub_docs:
print(doc.to_dict())
Note: you have to make a "dummy" entry within the top-level documents, otherwise there is no link between the root collection and the underlying subcollections.

Firestore query for subcollections on a deleted document

When using the Firebase console it is possible to see all documents and collections, even subcollections where the path has "documents" that do not exist.
This is illustrated in the picture included here, and as stated in the docs and on the screenshot as well. These documents won't appear in queries or snapshots. So how does the console find these nested subcollections, when a query does not return them?
Is it possible, somehow, to list these documents. Since the console can do it, it seems there must be a way.
And if it is possible to find these documents, is it possible to create a query that fetches all the documents that are non-existant but limited to those that have a nested subcollection? (Since the set of all non-existant documents would be infinite)
The Admin SDK provides a listDocuments method with this description:
The document references returned may include references to "missing
documents", i.e. document locations that have no document present but
which contain subcollections with documents. Attempting to read such a
document reference (e.g. via .get() or .onSnapshot()) will return a
DocumentSnapshot whose .exists property is false.
Combining this with the example for listing subcollections, you could do something like the following:
// Admin SDK only
let collectionRef = firestore.collection('col');
return collectionRef.listDocuments().then(documentRefs => {
return firestore.getAll(documentRefs);
}).then(documentSnapshots => {
documentSnapshots.forEach(doc => {
if( !doc.exists ) {
console.log(`Found missing document: ${documentSnapshot.id}, getting subcollections`);
doc.getCollections().then(collections => {
collections.forEach(collection => {
console.log('Found subcollection with id:', collection.id);
});
});
}
});
});
Note that the Firebase CLI uses a different approach. Via the REST API, it queries all documents below a given path, without having to know their specific location first. You can see how this works in the recursive delete code here.
Is it possible to create a query that fetches all these subcollections that are nested under a document that does not exist.
Queries in Cloud Firestore are shallow, which means they only get documents from the collection that the query is run against. There is no way in Cloud Firestore to get documents from a top-level collection and other collections or subcollections in a single query. Firestore doesn't support queries across different collections in one go. A single query may only use properties of documents in a single collection or subcollection.
So in your case, even if one document does not exist (does not contain any properties), you can still query a collection that lives beneath it. With other words, you can query the queue subcollection that exist within -LFNX ... 7UjS document but you cannot query all queue subcollection within all documents. You can query only one subcollection at a time.
Edit:
According to your comment:
I want to find collections that are nested under documents that do not exist.
There is no way to find collections because you cannot query across different collections. You can only query against one. The simplest solution I can think of is to check if a document within your items collection doesn't exist (has no properties) and then create a query (items -> documentId -> queue), and check if has any results.
Edit2:
The Firebase Console is telling you through those document ids shown in italics that those documents just does not exist. Those documents do not exist because you didn't create them at all. What you did do, was only to create a subcollection under a document that never existed in the first place. With other words, it merely "reserves" an id for a document in that collection and then creates a subcollection under it. Typically, you should only create subcollections of documents that actually do exist but this is how it looks like when the document doesn't exist.
In Cloud Firestore documents and subcollections don't work like filesystem files and directories you're used. If you create a subcollection under a document, it doesn't implicitly create any parent documents. Subcollections are not tied in any way to a parent document. With other words, there is no physical document at that location but there is other data under the location.
In Firebase console those document ids are diplayed so you can navigate down the tree and get the subcollections and documents that exist beneath it. But in the same time the console is warning you that those document does not exist, by displaying their ids in italics. So you cannot display or use them because of the simple fact that there is no data beneath it. If you want to correct that, you have to write at least a property that can hold a value. In that way, those documents will hold some data so you can do whatever you want.
P.S. In Cloud Firestore, if you delete a document, its subcollections will continue to exist and this is because of the exact same reason I mentioned above.

Resources