How to secure data using Firestore's rules.duration & rules.timestamp? - firebase

JS / Node.js solution:
How to use Firestore's rules.duration and/or rules.timestamp or other Firestore rules to ensure that a document could be created daily?
Put another way, a user would create, for example a comment/remark/tweet, at most once daily? So how to enforce using Firestore security rules?
For instance, Monday (24 Dec 2018) I could write a new comment. Tuesday (25 Dec 2018) I could write another new comment. But if I were to write the 2nd new comments on Tuesday (25 Dec 2018) it would NOT allow.
The solution should be able to work for daily, weekly, monthly, or quarterly.

Security rules don't have a sense of time, other than the current moment in time that some access occurred, and other timestamps in other documents. So you will have to use timestamps in other documents to gate access.
The only way I can think of to achieve this is in conjunction with Cloud Functions. You could have a single document per user that acts as a write location for new post data. Rules on that document would check that the user is doing two things:
Writing the current time (servervalue timestamp) into a known field.
The current time is also not less than the allowed time since the last write of that field.
When the write is successful, a Cloud Function could trigger on that write, then copy the post data from other fields in that document into the final document where the post must live.
Or you could simplify things a bit, skip the security rules, and just have a Cloud Function that deletes incoming documents that don't satisfy your post frequency rules by querying for the most two recent posts from that user, and checking their timestamps.

Related

Hourly Backup Firestore Databse

I need to back up my prod server Firestore DB hourly. I know about exportDocuments but it incurs one read operation per document exported. I have more than 3 million and these are increasing day by day.
Is it possible to export docs that are added/updated in a given period like the last 1 hour?
I already have Cloud Scheduler + Cloud Pub/Sub + function-based backup system. It is backing up all the docs. It is costing too much.
If you need to schedule some operations in Firestore, you can consider using Cloud Scheduler, which allows you to schedule HTTP requests or Cloud Pub/Sub messages to Cloud Functions for Firebase that you deploy.
If you need to get the documents that are added/updated in a given period of time, like the last 1 hour, then don't forget to add a timestamp field to your documents. In this way, you can query based on that timestamp field.
To get docs that are added/updated in a given period like in the last 1 hour, add a field to the document, say lastUpdated, and keep its value current with every insert/update.
Then query for incremental documents like, where("lastUpdated", ">", "lastExportTimestamp") from the backup function, lastExportTimestamp being the time of last export (and may be stored in a separate collection).
See an example here.
Hope this clarifies, else leave a comment.
P.S. Please be advised that this approach may still need a full periodic backup (say daily), for ease of restore process, if/when required.

What constitutes a write action in Firestore?

I'm currently developing a Flutter web application using Firestore for data persistence. The app is not live in production, so I'm the only one accessing this backend. There is only one collection that holds a single document, with many nested fields (6 levels deep). My understanding from looking at https://firebase.google.com/docs/firestore/pricing, is that reads are counted per doc, so every time I reload my app it should count as one read, yet in the last 4 hours since I started working today I already hit 1.7K reads (as reported in the usage tab). I know I haven't reloaded the app that many times, and there's also no hidden loop that calls the collection multiple times.
This is the Flutter code that calls Firestore:
final sourceRef=FirebaseFirestore.instance.collection("source");
var data=await sourceRef.doc("stats").get();
What am I missing please?
According to Firebase pricing, writes are defined as:
You are charged for each document read, write, and delete that you perform with Cloud Firestore.
Charges for writes and deletes are straightforward. For writes, each set or update operation counts as a single write.
Meaning that one document created is one write. If the same document is updated later, then Firebase counts it as one more write.
Here is a more detailed table that you can use for billing, and an example.
It is recommended to view individual product usage in the "Usage" tab for many products in the Firebase console, as this can narrow the product that is causing the elevated usage that you are seeing.
I would highly recommend adding write and view logs to your application; that way, you can monitor how many writes and reads you have.

Cloud Function Query Listener Based on Timestamp

I am building an application that must react when the timestamp for a certain Firestore document becomes older than the current time. Is there a way to setup this type of query listener as a Cloud Function, or otherwise achieve the desired goal of reacting to a document when its timestamp crosses the current time?
From what I can tell reading the Firestore and Cloud Function documentation, query listeners may not be possible to setup as Cloud Functions. Furthermore, this is not just a regular query listener - the query criteria (time) is dynamic, so it isn't the typical query structure ("is A < 5") but a dynamic one ("is T < now" where "now" is changing every moment).
If it's true this is not possible as a query listener, I'd certainly appreciate any suggestions on how to achieve this goal through another means. One idea I had was to create a Cloud Function that triggers every 60 seconds and runs the queries based on the time at that moment, but this would not allow constant listening (and 60 seconds is unfortunately too long for our usage). Thank you so much in advance
Firestore queries can only filter on literal values that are explicitly stored in the documents they return. There's no way to perform a calculation in a query, so any time you need a now in the query - that timestamp will be calculated at the moment the query is created.
There are two common ways to implement the time-to-live type functionality that you describe:
Set up a process that periodically runs (e.g. a time-based Cloud Function), and every time the process runs perform a query to determine what documents have expired.
As a variant of this, you could start a permanent listener for updates each time the Cloud Function triggers and keep that active for slightly less than the interval until the next trigger.
Create a Cloud Task for each document that expires/triggers when the document needs to be processed. While this may seem more complex, it actually ends up being simpler due to the fact that your callbacks now trigger on individual documents.
Also see: Is there any TTL (Time To Live ) for Documents in Firebase Firestore, which includes a link to Doug's excellent article on How to schedule a Cloud Function to run in the future with Cloud Tasks (to build a Firestore document TTL).

Firebase Firestore Read Costs - Clarification

I am using Firestore DB for an e-commerce app. I have a collection of products, each product has a document that has a "title" field and "search_keywords" field. The search keyword field stores an array. For example, if the title="apple", then the "search_keywords" field would store the following array: ["a","ap","app","appl","apple"]. When the user starts typing "apple" in the search box, I want to show the user, all products where "search_keywords" contains "a", then when they type the "p", I want to show all products where search keywords contain "ap"...and so on. Here is the snippet of code that gets called each time an additional letter is typed:
firebaseFireStore.collection("Produce").whereArrayContains("search_keywords", toSearch).get()
For example, in every case, the documents that would be returned on each successive call where an additional letter was typed would be a subset of what was returned in the previous call - it would just be a smaller list of documents - documents that were read on the previous query. My question is since the documents retrieved on a successive query are a subset of those retrieved in a prior query, would I be charged reads based on how many documents each successive query returns, or would Firestore have them in the cache and read them from there since the successive result set is a subset of a prior result set. This question has been on my mind for a while and every time I search for it, I can't seem to find a clear answer. For example, based on my research, the following two posts on Stackoverflow have involved similar questions and the following are relevant quotes from there, but they seem to contradict each other because #AlexMamo says "it will always read the online version of the documents...[when online]" and #Doug Stevenson says "if the local persistence is enabled on your client (it is by default) and the documents haven't been updated in the server...[it will get them from the cache]". I would appreciate any clarification on this if anyone knows the answer. Thanks.
"If the OP has offline persistence enabled, which is by default in Cloud Firestore, then he will be able to read the cache only while offline. When the OP has internet connectivity, it will always read the online version of the documents." –
Alex Mamo (https://stackoverflow.com/a/69320068/14556386)
"According to this answer by Doug Stevenson, the reads are only charged when performed upon the server, not your local cache. That is if the local persistence is enabled on your client (it is by default) and the documents haven't been updated in the server."
(https://stackoverflow.com/a/61381656/14556386)
EDIT: In addition, if for each product document that was retrieved by the Firestore search, I download its corresponding image file from Firebase Storage. Would it charge me for downloading that file on successive attempts to download it or would it recognize that I had previously downloaded that image and fetch it from cache automatically?
First of all, storing ["a", "ap", "app", "appl", "apple"] into an array and performing an whereArrayContains() query, doesn't sound like a feasible idea. Why? Imagine you have a really big online shop with 100k products, in which 5k start with "a". Are you willing to pay 5k reads every time a user types "a"? That's a very costly feature.
Most likely you should return the corresponding documents when the user types, for example, two, or even three characters. You'll reduce costs enormously. Or you might take into consideration using the solution I have explained in the following article:
How to filter Firestore data cheaper?
Let's go forward.
For example, in every case, the documents that would be returned on each successive call where an additional letter was typed would be a subset of what was returned in the previous call, it would just be a smaller list of documents.
Yes, that's correct.
My question is since the documents retrieved on a successive query are a subset of those retrieved in a prior query, would I be charged reads based on how many documents each successive query returns?
Yes. You'll always be charged with a number of reads that is equal to the number of documents that are returned by your query. It doesn't matter if a query was previously performed, or not. Every time you perform a new query, you'll be charged with a number of reads that is equal to the number of documents you get.
For example, let's assume you perform this query:
.whereArrayContains("search_keywords", "a")
And you get the 100 documents, and right after that you perform:
.whereArrayContains("search_keywords", "ap")
And you get only 30 documents, you'll have to pay 130 reads, and not only 100. So it doesn't matter if the documents that are returned by the second query are a subset of the documents that are returned by the first query.
Or would Firestore have them in the cache and read them from there since the successive result set is a subset of a prior result set.
No, it won't. It will read those documents from the cache only if the user losses the internet connectivity, otherwise it will always read the online versions of the documents that exist on the Firebase servers. The cached version of the documents works only when the user is offline. I have also written an article on this topic called:
How to drastically reduce the number of reads when no documents are changed in Firestore?
In Doug's answer:
Am I charged with read operations everytime the location is changed?
He clearly says:
You are charged for the number of documents read on the server every time you call get().
So if you called get(), you have to pay as reads, the number of documents that are returned.
The following statement is available:
If local persistence is enabled in your client (it is by default), then the documents may come from the cache if the documents are also not changed on the server.
When you are listening for real-time updates. According to the docs:
When you listen to the results of a query, you are charged for a read each time a document in the result set is added or updated. You are also charged for a read when a document is removed from the result set because the document has changed.
And I would add, if nothing has changed, you don't have to pay anything. Again, according to the same docs:
Also, if the listener is disconnected for more than 30 minutes (for example, if the user goes offline), you will be charged for reads as if you had issued a brand-new query.
So if the listener is active, you always read the documents from the cache. Bear in mind that a get() operation is different than listening for real-time updates.
if for each product document that was retrieved by the Firestore search, I download its corresponding image file from Firebase Storage. Would it charge me for downloading that file on successive attempts to download it or would it recognize that I had previously downloaded that image and fetch it from cache automatically?
You'll always be charged if you download the image over and over again unless you are using a library that helps you cache the images. For Android, there is a library called Glide:
Glide is a fast and efficient open-source media management and image loading framework for Android that wraps media decoding, memory and disk caching, and resource pooling into a simple and easy-to-use interface.

Best way to trigger function when data is being read. Google Cloud Functions

I am trying to figure out the best way to execute my cloud function for my firestore database, when data is being read.
I have a field on all of my documents with the timestamp of when the document was last used, this is used to delete documents that haven't been used in two days. The deletion is done by another cloud function.
I want to update this field, when the documents is being used AKA read from my db. What would be the best way to do this?
onWrite(), onCreate(), onUpdate() and onDelete() is not an option.
My database is used by a Android App written in Kotlin.
There are no triggers for reading data. That would not be scalable to provide. If you require a last read time, you will have to control access to your database via some middleware component that you write, and have all readers query that instead. It will be responsible for writing the last read time back to the database.
Bear in mind that Firestore documents can only be written about once every second, so if you have a lot of access to a document, you may lose data.

Resources