Optimizing the number of reads from firestore server using caching or snapshot listener - firebase

I am rendering the following view using Firebase. So basically the search is powered by a Firebase query.
I am using the following code:
Query query = FirebaseUtils.buildQuery(
fireStore, 'customers', filters, lastDocument, documentLimit);
print("query =" + query.toString());
QuerySnapshot querySnapshot = await query.getDocuments();
print("Got reply from firestore. No of items =" + querySnapshot.documents.length.toString());
Questions:
If the user hits the same query, again and again, it still hits the server. I checked this by using doc.metadata.isFromCache and it always returns false.
Will using query snapshots help in reduce no of reads for this search query? I guess no. As the user is changing the query again and again.
Any other way to limit the number of reads?

If the user hits the same query, again and again, it still hits the server. I checked this by using doc.metadata.isFromCache and it always returns false.
If you are online, it will always return false and that's the expected behavior since the listener is always looking for changes on the server. If you want to force the retrieval of the data from the cache while you are online, then you should explicitly specify this to Firestore by adding Source.CACHE to your get() call. If you're offline, it will always return true.
Will using query snapshots help in reduce no of reads for this search query? I guess no. As the user is changing the query again and again.
No, it won't. What does a query snapshot represent? It's basically an object that contains the results of your query. However, if you perform a query, "again and again", as long as it's the same query and nothing has changed on the server, then you will not be charged with any read operations. This is happening because the second time you perform the query, the results are coming from the cache. If you perform each time a new search, you'll always be billed with a number of read operations that are equal with the number of elements that are returned by your query. Furthermore, if you create new searches and the elements that are returned are already in your cache, then you'll be billed with a read operation only for the new ones.
Any other way to limit the number of reads?
The simplest method to limit the results of a query is to use a limit() call and pass as an argument the number of elements you want your query to return:
limit(10)

Related

How does Cosmos DB Continuation Token work?

At first sight, it's clear what the continuation token does in Cosmos DB: attaching it to the next query gives you the next set of results. But what does "next set of results" mean exactly?
Does it mean:
the next set of results as if the original query had been executed completely without paging at the time of the very first query (skipping the appropriate number of documents)?
the next set of results as if the original query had been executed now (skipping the appropriate number of documents)?
Something completely different?
Answer 1. would seem preferable but unlikely given that the server would need to store unlimited amounts of state. But Answer 2. is also problematic as it may result in inconsistencies, e.g. the same document may be served multiple times across pages, if the underlying data has changed between the page queries.
Cosmos DB query executions are stateless at the server side. The continuation token is used to recreate the state of the index and track progress of the execution.
"Next set of results" means, the query is executed again on from a "bookmark" from the previous execution. This bookmark is provided by the continuation token.
Documents created during continuations
They may or may not be returned depending on the position of insert and query being executed.
Example:
SELECT * FROM c ORDER BY c.someValue ASC
Let us assume the bookmark had someValue = 10, the query engine resumes processing using a continuation token where someValue = 10.
If you were to insert a new document with someValue = 5 in between query executions, it will not show up in the next set of results.
If the new document is inserted in a "page" that is > the bookmark, it will show up in next set of results
Documents updated during continuations
Same logic as above applies to updates as well
(See #4)
Documents deleted during continuations
They will not show up in the next set of results.
Chances of duplicates
In case of the below query,
SELECT * FROM c ORDER BY c.remainingInventory ASC
If the remainingInventory was updated after the first set of results and it now satisfies the ORDER BY criteria for the second page, the document will show up again.
Cosmos DB doesn’t provide snapshot isolation across query pages.
However, as per the product team this is an incredibly uncommon scenario because queries over continuations are very quick and in most cases all query results are returned on the first page.
Based on preliminary experiments, the answer seems to be option #2, or more precisely:
Documents created after serving the first page are observable on subsequent pages
Documents updated after serving the first page are observable on subsequent pages
Documents deleted after serving the first page are omitted on subsequent pages
Documents are never served twice
The first statement above contradicts information from MSFT (cf. Kalyan's answer). It would be great to get a more qualified answer from the Cosmos DB Team specifying precisely the semantics of retrieving pages. This may not be very important for displaying data in the UI, but may be essential for data processing in the backend, given that there doesn't seem to be any way of disabling paging when performing a query (cf. Are transactional queries possible in Cosmos DB?).
Experimental method
I used Sacha Bruttin's Cosmos DB Explorer to query a collection with 5 documents, because this tool allows playing around with the page size and other request options.
The page size was set to 1, and Cross Partition Queries were enabled. Different queries were tried, e.g. SELECT * FROM c or SELECT * FROM c ORDER BY c.name.
After retrieving page 1, new documents were inserted, and some existing documents (including documents that should appear on subsequent pages) were updated and deleted. Then all subsequent pages were retrieved in sequence.
(A quick look at the source code of the tool confirmed that ResponseContinuationTokenLimitInKb is not set.)

How to return the total matches on a Cosmos db query

I have setup an api that will query our Cosmos db and return the JSON results back to the front end app. There is a user defined limit on the number of results. If the number of results exceed the limit then I pass back the token to the front end and they can call for the next group of rows. The issue is I would like to provide a count of the Total Number of Matches back to the application. I have looked at the query statistics but don't see where there is a total count.
On the call to CreateDocumentQuery, i'm setting MaxItemCount to the limit, and RequestContinuation either null or the continuationToken. Looking at QueryMetrics I found RetrievedDocumentCount, but that does not seem to have the correct value.
Thanks,
J
x-ms-max-item-count request header controls how many documents should be returned to user.
Default value is 100
if your query returns 150 documents, your request will return first 100 documents and it will return a continuation token in response header(x-ms-continuation). If there is a token, you need to send another request with the given token to get the rest of the data.
SDK should be doing that for you automatically. Can you share some of your code. I might have a better answer then.
You can check out my post about this too.
https://h-savran.blogspot.com/2019/04/introduction-to-continuation-tokens-in.html

Firestore query get size of results without reading the documents?

I have an app that returns a list of health foods. There will be approximately 10000-20000 foods (documents) in the product collection.
These foods are queried by multiple fields using arrayContains. This may be categories, subcategories and when the user searches in the search bar it is an arrayContains on the keywords array.
With so many products I plan to paginate the results of query as I get the documents. The issue is that I need to know the amount of results to display the total of results to the user.
I have read that for a query you are charged one read and then if you get the documents then they are further charged per document. Is there a way of getting the number of results for a query without getting all the documents.
I have seen this answer here:
Get size of the query in Firestore
But in this example they say to use a counter which doesn't seem practical as I am using a query on keyword when the user searches and I am using a mixture of categories, subcategories when the user filters.
Thanks
With so many products I plan to paginate the results of query as I get the documents.
That's a very good decision since getting 10000-20000 foods (documents) at once is not an option. Reason one is the cost, it will be quite expensive and second is that you'll get an OutOfMemoryError when trying to load such enormous amount of data.
The issue is that I need to know the amount of results to display the total of results to the user.
There is no way in Firestore so you can know in advance the size of the result set.
Is there a way of getting the number of results for a query without getting all the documents.
No, you have to page through all the results that are returned by your query to get the total size.
But in this example they say to use a counter which doesn't seem practical as I am using a query on keyword when the user searches
That's correct, that solution doesn't fit your needs since it solves the problem of storing the number of all documents in a collection and not the number of documents that are returned by a query. As far as I know, it's just not scalable to provide that information, in the way this cloud hosted, NoSQL, realtime database needs to "massively scale".
For any future lurker, a "solution" to this problem is to paginate results with a cursor until the query doesn't return any more documents. When the query snapshot is empty, return undefined for your cursor and handle from there:
const LIMIT = 100
const offset = req.query.offset
const results = db.collection(COLLECTION)
.offset(offset)
.limit(LIMIT)
.get()
const docs = results.docs.map(doc => doc.data())
res.status(200).send({
data: docs,
// Return the next offset or undefined if no docs are returned anymore.
offset: docs.length > 0 ? offset + LIMIT : undefined
})

Do queries get triggered on any change or just when their result changes?

I was wondering whether Cloud Firestore queries that you listen to will fire for every change to that collection or just when the result of that specific query changes.
I am mainly interested in the behavior of the cloud_firestore Flutter plugin, however, I would assume that this is handled by Cloud Firestore in the back end.
Say, I have the following query:
Firestore.instance
.collection('scores')
.where('uid', isEqualTo: uid)
.orderBy('score', descending: true)
.limit(1)
.snapshots();
This is supposed to only return the highest score for a specific user.
I am wondering if the Stream returned from snapshots will fire for any change in the collection, e.g. when a lower score is added for the user, or just when the result changes, i.e. a higher score for this user is added.
Through empirical research I found out the following:
The query will return an result initially whether there are matches or not, i.e. also when the user has no documents in that collection.
Afterwards, it will only fire when the result changes, i.e. when the documents returned by that query are different.
This is the case for adding the first score for that user while the query is already listening, removing the highest score while listening, and adding a higher score to the query.
Consequently, the number of DocumentChange's will either be 1 or 2.
The query will completely ignore any documents added or removed that do not affect the result. This is also mandatory in order to scale real-time listening.
Furthermore, the documentation somwhat confirms this. It is not really clear what "contents change" at the beginning of the article or which documents (that are added, removed, or modified) are meant in the where-query section are meant. However, it can be assumed that only the document and content that matters is meant, which my testing has confirmed.

How can I restrict access to collections when using Firestore's REST API

I've noticed that when making a GET request to the Firestore's REST API with a Collection path, the response will be the whole collection(I didn't check with a big collection).
I would like to know how can I limit the number of retrieved documents when such request is made. For example, return only 15.
When using the list method to get documents in a collection, you can use the pageSize query parameter along with pageToken to limit the number of documents, as described in the documentation.
pageSize
The maximum number of documents to return.
pageToken
The nextPageToken value returned from a previous List request, if any.

Resources