I've built a real-time web editor similar to the one at firepad.io for my purposes, in a web-app.
I'm using firestore backend, utilizing it's real-time sync capabilities (Great!)
At this time, I store the content of my text editor as a document, the problem is that every time the content changes, I've to make a WRITE for the entire document in the firestore. I think it's a waste of network bandwidth, plus it's probably costly as well (I haven't done the cost evaluation yet).
Is there a way to make partial updates to a document in firestore?
Thank you.
You can update individual fields, each value being written entirely. That's as small of a write you can make. But a document write of any size is still billed as 1 write operation.
Related
I have a specific feature I'm using that requires many (thousands) of small pieces of indexed data. Rather than fetching a thousand documents per startup and incurring unnecessary costs, I would like to simply download the whole giant document at once, and merge changes by key.
This means the document might approach the 1MB limit.
I have a curiosity about bandwidth though. I'm wondering if Firestore intelligently only sends/receives the most economical amount of the document. This means, for example, if I have 2000 entries in this one document, and I update one using {merge:true}, how much bandwidth will my browser use for this? Would it use only what's needed? Sending only the merged part rather than merging it in the background and sending the whole document?
And what about onSnapshot. For example, if I'm listening for changes in this large document, and it changes, and the new document is downloaded, is the onSnapshot logic behind the scenes smart enough to know to only download the necessary (changed) portion of the document rather than the full 1MB?
My users will be on data and I don't want to waste their data.
Thanks!
When you call documentRef.set(..., { merge: true }) the Firestore SDK sends exactly what you pass as ... to the server. The same happens when you call it without { merge: true }.
An onSnapshot listener always received the complete document, regardless of what/how much has changed in that document.
So by merging the many small documents into a single document you are trading the cost of document reads for a cost in bandwidth consumption. Whether this trade-off is worth it, all depends on your use-case and data. I recommend using the pricing calculator to determine the exact values.
I am working on a video based app that keeps track of how many views that video has received. I originally planned on having a field for view_count in my document that I would write to after someone watches a video.
However, knowing how many writes that could end up leading to, I started to wonder if it's possible to see a breakdown of how many reads have been made for each document in a collection and use that number instead. Since the videos are short, I figured this would be an accurate number for the view count.
Is this possible to access this kind of data?
Firestore does not expose any per-document access metrics. The available monitoring options are shown on this page on monitoring usage.
If you want something beyond that you'll have to build it yourself, as you originally intended.
I'm working on some posting forum projects and trying to figure out the ideal Firestore database structure.
I read that documents have a max size of 1 mg but what are the pros and cons to maxing out the storage space of each document by having multiple posts stored in a document rather than using a single document for each post?
I think it would be cheaper. Assuming that the app would make use of all the data in a document, the bandwidth costs would be the same but rather than multiple reads, I would be charged for only one document. Does this make sense?
Would it also be faster?
You can likely store many posts in a single document, and depending on your application, there may be good reasons for doing so. Just keep a few things in mind:
Firestore always reads complete documents. So if you store 100 posts in a single 1MB document, to only display 10 of those posts, you may have reduced the read operations by 10x, but you've increased the bandwidth consumption by 10x. And your mobile users will likely also pay for that bandwidth.
Implementing your own sharding strategy is not always hard, but it's seldom related to application functionality.
My guidelines when modeling data in any NoSQL database is:
model application screens in your database
I tend to model the data in my database after the screens that I have in my application. So if you typically show a list of headlines of recent articles when a user starts the app, I might actually create a document that contains just the headlines of recent articles. That way the app only has to read a single document with just the headlines, instead of having to read each individual post. This reduces not only the number of documents the app needs to read, but also the bandwidth it consumes.
don't be afraid to duplicate data
This goes hand-in-hand with the previous guideline, and is very normal across all NoSQL databases, but goes against the core of what many of us have learned from relational databases. It is sometimes also referred to as denormalizing, as it counters the database normalization of relations database models.
Continuing the example from before: you'll probably have a separate document for each post, just to make sure that each post has its own single point of definition. But you'll store parts of that post in many other places, such as in the document-of-recent-headlines that we had before. This means that we'll have to duplicate the data for each new post into that document, and possibly multiple other places. This process is known as fan-out, and there are some common strategies for updating this denormalized data.
I find that this duplication leads to no concerns, as long as it is clear what the main point of definition for each entity is. So in our example: if there ever is a difference between the headline of a post in the post-document itself, and the document-of-recent-headlines, I know that I should update the document-of-recent-headlines, since the post-document itself is my point-of-definition for the post.
The result of all this is that I often see my database as part actual data storage, part prerendered fragments of application screens. As long as the points of definition are clear, that works quite well and allows me to define data models that scale efficiently both for users of the applications that consume the data and for the cost to operate them.
To learn more about NoSQL data modeling:
NoSQL data modeling
Getting to know Cloud Firestore, which contains many more examples of these prerendered application screens.
I need to store a large number of fields, like for a star rating system, but firestore only allows 20,000 fields per document. Is there a known way around this? Right now I am going to 'shard' the fields in multiple documents, and keep the size of each document in a documentSizeTracker document that I use to determine which document to shard to (and add to the counter with a transaction). Is this the correct approach? Any problems with this?
Sharding certainly could work. It's hard to say without knowing exactly what kind of data you'll need from your document, and when, but that's certainly a reasonable option. You could also consider having a parent "summary" doc that contains fields you might want to search on and then split all of your data into several documents inside a subcollection of that parent.
One important nuance here: the limit isn't 20,000 fields, but 20,000 indexed fields. So if you're storing a bunch of data inside your document, but you know that you're not going to be searching on all of them, another alternative is to mark some of your fields as unindexed (which you can now do in the Firebase console in the "Exemptions" section).
If you're dealing with thousands of fields, though, you probably won't want to exempt them all one at a time, so a better alternative might be to place your data as a map inside a container field (named something like "allOfMyData"), then just mark that one field as unindexed. That will automatically remove all indexes from any fields contained inside that map.
Actually, I ran into similar problem with the read and write issues with Firebase. So, here is my conclusion:
# if something small needs to be written & read very often, then use Firebase Realtime Database
Firebase Realtime database allows fast writes, but limits concurrent users to 100,000
Firebase Firestore allows a maximum of 1 write per second per document
It's very expensive to read a document that only contains a rating for example in Firestore
# if something (larger) needs to be read very often with writes usually more than 1 second in between then use Firestore
Firestore allows up to 1,000,000 concurrent users at current Beta release (they might make it more)
It's cheaper to read a large document (less than 1 MiB limit) in Firestore than Firebase Realtime database
# If your model doesn't fit into these two choices, then you should modify your model and split them into 2 models:
1 very small model to store in Firebase Real Database (ratings for example)
1 larger model to store in Firestore
Note: You could use both Firebase Realtime database and Firebase Firestore in the same project. Don't forget to take into account the billing differences between both databases. and their different limits. I believe, it's best to combine them and use the good side of each instead of trying to force solutions into one of them.
Note 2: I really didn't like the shard-ing idea in Firestore suggested solution and work around
I am trying to figure out the best way to execute my cloud function for my firestore database, when data is being read.
I have a field on all of my documents with the timestamp of when the document was last used, this is used to delete documents that haven't been used in two days. The deletion is done by another cloud function.
I want to update this field, when the documents is being used AKA read from my db. What would be the best way to do this?
onWrite(), onCreate(), onUpdate() and onDelete() is not an option.
My database is used by a Android App written in Kotlin.
There are no triggers for reading data. That would not be scalable to provide. If you require a last read time, you will have to control access to your database via some middleware component that you write, and have all readers query that instead. It will be responsible for writing the last read time back to the database.
Bear in mind that Firestore documents can only be written about once every second, so if you have a lot of access to a document, you may lose data.