Regarding firebase reads and writes - firebase

I have some questions regarding firebase, which I think many of the beginners have.
Let's say I have this query:-
var collecRef=FirebaseFirestore.instance.collection('aCollection').where("a"=="b").orderBy(//some more code);
If I execute this, how many reads will it cost? If :-
There are 5 documents which match the condition (a==b)
There are no documents which match the condition.
Now,
if I want to update the data in a document using setData(), with merge=true, would it cost a write? If data is intact? For example in a document I have saved the user name of a user and in my app, my users can change their names.
Now,
If they try to update their name with (setData()), and they haven't entered a DIFFERENT NAME(the name is same), would it cost a write?

One document received from a query costs one read. That is all you need to know. The conditions don't matter, and the size of the collection doesn't matter. Just the number of documents received.
One call to setData costs one write. It doesn't matter what you write, or the current contents of the document.

Related

How to structure data Firestore, for multiple user enteries

This is my first time using a NOSQL database and I'm really struggling to work out how to structure my data.
I have an app that predicts a users mood and then the user can select if that's right or not. So I need to save both the prediction and the actual result. I want to be able to pull the latest result from firebase and display it on the app.
I understand how I'd do this on an SQL DB and understand how to write an SQL query to get that data back out.
For my Firebase DB I thought of the following structure
the document name is the usersID and store multiple arrays based on the timestamp but I can't seem to user OrderBy on a document only a collection so not sure how to get this back.
The fact that this seems so difficult less me to believe I've implemented the DB wrong to begin with.
Structure of DB is as follows:
I should add that it all works fine for the USER_TABLE as its one document id and a single entry, so I've no problem retrieving that.
Thanks for your help!
orderBy is an instruction to the database to order documents on the server, before it returns them to your app. To store the fields inside the document, you can just do that inside your application code after it receives the document(s).
There is in itself nothing wrong with storing these entries in a single document, Just keep in mind that:
A document can be at most be 1MB in size, so make sure this fits your maximum number of entries.
Firestore only ever returns full documents, so you will either get all entries in a document, or none of them.
You won't be able to order or filter the entries inside a single document. If that is a requirement for you, consider storing each entry in its own document in a subcollection. Note that this will increase the number of documents each user reads though, which will increase the cost.

Firebase Firestore Read Costs - Clarification

I am using Firestore DB for an e-commerce app. I have a collection of products, each product has a document that has a "title" field and "search_keywords" field. The search keyword field stores an array. For example, if the title="apple", then the "search_keywords" field would store the following array: ["a","ap","app","appl","apple"]. When the user starts typing "apple" in the search box, I want to show the user, all products where "search_keywords" contains "a", then when they type the "p", I want to show all products where search keywords contain "ap"...and so on. Here is the snippet of code that gets called each time an additional letter is typed:
firebaseFireStore.collection("Produce").whereArrayContains("search_keywords", toSearch).get()
For example, in every case, the documents that would be returned on each successive call where an additional letter was typed would be a subset of what was returned in the previous call - it would just be a smaller list of documents - documents that were read on the previous query. My question is since the documents retrieved on a successive query are a subset of those retrieved in a prior query, would I be charged reads based on how many documents each successive query returns, or would Firestore have them in the cache and read them from there since the successive result set is a subset of a prior result set. This question has been on my mind for a while and every time I search for it, I can't seem to find a clear answer. For example, based on my research, the following two posts on Stackoverflow have involved similar questions and the following are relevant quotes from there, but they seem to contradict each other because #AlexMamo says "it will always read the online version of the documents...[when online]" and #Doug Stevenson says "if the local persistence is enabled on your client (it is by default) and the documents haven't been updated in the server...[it will get them from the cache]". I would appreciate any clarification on this if anyone knows the answer. Thanks.
"If the OP has offline persistence enabled, which is by default in Cloud Firestore, then he will be able to read the cache only while offline. When the OP has internet connectivity, it will always read the online version of the documents." –
Alex Mamo (https://stackoverflow.com/a/69320068/14556386)
"According to this answer by Doug Stevenson, the reads are only charged when performed upon the server, not your local cache. That is if the local persistence is enabled on your client (it is by default) and the documents haven't been updated in the server."
(https://stackoverflow.com/a/61381656/14556386)
EDIT: In addition, if for each product document that was retrieved by the Firestore search, I download its corresponding image file from Firebase Storage. Would it charge me for downloading that file on successive attempts to download it or would it recognize that I had previously downloaded that image and fetch it from cache automatically?
First of all, storing ["a", "ap", "app", "appl", "apple"] into an array and performing an whereArrayContains() query, doesn't sound like a feasible idea. Why? Imagine you have a really big online shop with 100k products, in which 5k start with "a". Are you willing to pay 5k reads every time a user types "a"? That's a very costly feature.
Most likely you should return the corresponding documents when the user types, for example, two, or even three characters. You'll reduce costs enormously. Or you might take into consideration using the solution I have explained in the following article:
How to filter Firestore data cheaper?
Let's go forward.
For example, in every case, the documents that would be returned on each successive call where an additional letter was typed would be a subset of what was returned in the previous call, it would just be a smaller list of documents.
Yes, that's correct.
My question is since the documents retrieved on a successive query are a subset of those retrieved in a prior query, would I be charged reads based on how many documents each successive query returns?
Yes. You'll always be charged with a number of reads that is equal to the number of documents that are returned by your query. It doesn't matter if a query was previously performed, or not. Every time you perform a new query, you'll be charged with a number of reads that is equal to the number of documents you get.
For example, let's assume you perform this query:
.whereArrayContains("search_keywords", "a")
And you get the 100 documents, and right after that you perform:
.whereArrayContains("search_keywords", "ap")
And you get only 30 documents, you'll have to pay 130 reads, and not only 100. So it doesn't matter if the documents that are returned by the second query are a subset of the documents that are returned by the first query.
Or would Firestore have them in the cache and read them from there since the successive result set is a subset of a prior result set.
No, it won't. It will read those documents from the cache only if the user losses the internet connectivity, otherwise it will always read the online versions of the documents that exist on the Firebase servers. The cached version of the documents works only when the user is offline. I have also written an article on this topic called:
How to drastically reduce the number of reads when no documents are changed in Firestore?
In Doug's answer:
Am I charged with read operations everytime the location is changed?
He clearly says:
You are charged for the number of documents read on the server every time you call get().
So if you called get(), you have to pay as reads, the number of documents that are returned.
The following statement is available:
If local persistence is enabled in your client (it is by default), then the documents may come from the cache if the documents are also not changed on the server.
When you are listening for real-time updates. According to the docs:
When you listen to the results of a query, you are charged for a read each time a document in the result set is added or updated. You are also charged for a read when a document is removed from the result set because the document has changed.
And I would add, if nothing has changed, you don't have to pay anything. Again, according to the same docs:
Also, if the listener is disconnected for more than 30 minutes (for example, if the user goes offline), you will be charged for reads as if you had issued a brand-new query.
So if the listener is active, you always read the documents from the cache. Bear in mind that a get() operation is different than listening for real-time updates.
if for each product document that was retrieved by the Firestore search, I download its corresponding image file from Firebase Storage. Would it charge me for downloading that file on successive attempts to download it or would it recognize that I had previously downloaded that image and fetch it from cache automatically?
You'll always be charged if you download the image over and over again unless you are using a library that helps you cache the images. For Android, there is a library called Glide:
Glide is a fast and efficient open-source media management and image loading framework for Android that wraps media decoding, memory and disk caching, and resource pooling into a simple and easy-to-use interface.

What is the most cost-efficient method of making document writes/reads from Firestore?

Firebase's Cloud Firestore gives you limits on the number of document writes and reads (and deletes). For example, the spark plan (free) allows 50K reads and 20k writes a day. Estimating how many writes and reads is obviously important when developing an app, as you will want to know the potential costs incurred.
Part of this estimation is knowing exactly what counts as a document read/write. This part is somewhat unclear from searching online.
One document can contain many different fields, so if an app is designed such that user actions done through a session require the fields within a single document to be updated, would it be cost-efficient to update all the fields in one single document write at the end of the session, rather than writing the document every single the user wants to update one field?
Similarly, would it not make sense to read the document once at the start of a session, getting the values of all fields, rather than reading them when each is needed?
I appreciate that method will lead to the user seeing slightly out-of-date field values, and the database not being updated admittedly, but if such things aren't too much of a concern to you, couldn't such a method reduce you reads/writes by a large factor?
This all depends on what counts as a document write/read (does writing 20 fields within the same document in one go count as 20 writes?).
The cost of a write operation has no bearing on the number of fields you write. It's purely based on the number of times you call update() or set() on a document reference, weither independently, in a transaction, or in a batch.
If you choose to write each N fields using N separate updates, then you will be charged N writes. If you choose to write N fields using 1 update, then you will be charged 1 write.

Managing Denormalized/Duplicated Data in Cloud Firestore

If you have decided to denormalize/duplicate your data in Firestore to optimize for reads, what patterns (if any) are generally used to keep track of the duplicated data so that they can be updated correctly to avoid inconsistent data?
As an example, if I have a feature like a Pinterest Board where any user on the platform can pin my post to their own board, how would you go about keeping track of the duplicated data in many locations?
What about creating a relational-like table for each unique location that the data can exist that is used to reconstruct the paths that require updating.
For example, creating a users_posts_boards collection that is firstly a collection of userIDs with a sub-collection of postIDs that finally has another sub-collection of boardIDs with a boardOwnerID. Then you use those to reconstruct the paths of the duplicated data for a post (eg. /users/[boardOwnerID]/boards/[boardID]/posts/[postID])?
Also if posts can additionally be shared to groups and lists would you continue to make users_posts_groups and users_posts_lists collections and sub-collections to track duplicated data in the same way?
Alternatively, would you instead have a posts_denormalization_tracker that is just a collection of unique postIDs that includes a sub-collection of locations that the post has been duplicated to?
{
postID: 'someID',
locations: ( <---- collection
"path/to/post/location1",
"path/to/post/location2",
...
)
}
This would mean that you would basically need to have all writes to Firestore done through Cloud Functions that can keep a track of this data for security reasons....unless Firestore security rules are sufficiently powerful to allow add operations to the /posts_denormalization_tracker/[postID]/locations sub-collection without allowing reads or updates to the sub-collection or the parent postIDs collection.
I'm basically looking for a sane way to track heavily denormalized data.
Edit: oh yeah, another great example would be the post author's profile information being embedded in every post. Imagine the hellscape trying to keep all that up-to-date as it is shared across a platform and then a user updates their profile.
I'm aswering this question because of your request from here.
When you are duplicating data, there is one thing that need to keep in mind. In the same way you are adding data, you need to maintain it. With other words, if you want to update/detele an object, you need to do it in every place that it exists.
What patterns (if any) are generally used to keep track of the duplicated data so that they can be updated correctly to avoid inconsistent data?
To keep track of all operations that we need to do in order to have consistent data, we add all operations to a batch. You can add one or more update operations on different references, as well as delete or add operations. For that please see:
How to do a bulk update in Firestore
What about creating a relational-like table for each unique location that the data can exist that is used to reconstruct the paths that require updating.
In my opinion there is no need to add an extra "relational-like table" but if you feel confortable with it, go ahead and use it.
Then you use those to reconstruct the paths of the duplicated data for a post (eg. /users/[boardOwnerID]/boards/[boardID]/posts/[postID])?
Yes, you need to pass to each document() method, the corresponding document id in order to make the update operation work. Unfortunately, there are no wildcards in Cloud Firestore paths to documents. You have to identify the documents by their ids.
Alternatively, would you instead have a posts_denormalization_tracker that is just a collection of unique postIDs that includes a sub-collection of locations that the post has been duplicated to?
I consider that isn't also necessary since it require extra read operations. Since everything in Firestore is about the number of read and writes, I think you should think again about this approach. Please see Firestore usage and limits.
unless Firestore security rules are sufficiently powerful to allow add operations to the /posts_denormalization_tracker/[postID]/locations sub-collection without allowing reads or updates to the sub-collection or the parent postIDs collection.
Firestore security rules are so powerful to do that. You can also allow to read or write or even apply security rules regarding each CRUD operation you need.
I'm basically looking for a sane way to track heavily denormalized data.
The simplest way I can think of, is to add the operation in a datastructure of type key and value. Let's assume we have a map that looks like this:
Map<Object, DocumentRefence> map = new HashMap<>();
map.put(customObject1, reference1);
map.put(customObject2, reference2);
map.put(customObject3, reference3);
//And so on
Iterate throught the map, and add all those keys and values to batch, commit the batch and that's it.

How to query Firestore collection for documents with field whose value is contained in a list

I have two Firestore collections, Users and Posts. Below are simplified examples of what the typical document in each contains.
*Note that the document IDs in the friends subcollection are equal to the document ID of the corresponding user documents. Optionally, I could also add a uid field to the friends documents and/or the Users documents. Also, there is a reason not relevant to this question that we have friends as a subcollection to each user, but if need-be we change it into a unified root-level Friends collection.
This setup makes it very easy to query for posts, sorted chronologically, by any given user by simply looking for Posts documents whose owner field is equal to the document reference of that user.
I achieve this in iOS/Swift with the following, though we are building this app for iOS, Android, and web.
guard let uid = Auth.auth().currentUser?.uid else {
print("No UID")
return
}
let firestoreUserRef = firestore.collection("Users").document(uid)
firestorePostsQuery = firestore.collection("Posts").whereField("owner", isEqualTo: firestoreUserRef).order(by: "timestamp", descending: true).limit(to: 25)
My question is how to query Posts documents that have owner values contained in the user's friends subcollection, sorted chronologically. In other words, how to get the posts belonging to the user's friends, sorted chronologically.
For a real-world example, consider Twitter, where a given user's feed is populated by all tweets that have an owner property whose value is contained in the user's following list, sorted chronologically.
Now, I know from the documentation that Firestore does not support logical OR queries, so I can't just chain all of the friends together. Even if I could, that doesn't really seem like an optimal approach for anyone with more than a small handful of friends.
The only option I can think of is to create a separate query for each friend. There are several problems with this, however. The first being the challenges presenting (in a smooth manner) the results from many asynchronous fetches. The second being that I can't merge the data into chronological order without re-sorting the set manually on the client every time one of the query snapshots is updated (i.e., real-time update).
Is it possible to build the query I am describing, or am I going to have to go this less-than optimal approach? This seems like a fairly common query use-case, so I'll be surprised if there is not a way to do this.
The sort chronologically is easy provided you are using a Unix timestamp, e.g. 1547608677790 using the .orderBy method. However, that leaves you with a potential mountain of queries to iterate through (one per friend).
So, I think you want to re-think the data store schema.
Take advantage of Cloud Functions for Firebase Triggers. When a new post is written, have a cloud function calculate who all should see it. Each user could have an array-type property containing all unread-posts, read-posts, etc.
Something like that would be fast and least taxing.

Resources