Hourly Backup Firestore Databse - firebase

I need to back up my prod server Firestore DB hourly. I know about exportDocuments but it incurs one read operation per document exported. I have more than 3 million and these are increasing day by day.
Is it possible to export docs that are added/updated in a given period like the last 1 hour?
I already have Cloud Scheduler + Cloud Pub/Sub + function-based backup system. It is backing up all the docs. It is costing too much.

If you need to schedule some operations in Firestore, you can consider using Cloud Scheduler, which allows you to schedule HTTP requests or Cloud Pub/Sub messages to Cloud Functions for Firebase that you deploy.
If you need to get the documents that are added/updated in a given period of time, like the last 1 hour, then don't forget to add a timestamp field to your documents. In this way, you can query based on that timestamp field.

To get docs that are added/updated in a given period like in the last 1 hour, add a field to the document, say lastUpdated, and keep its value current with every insert/update.
Then query for incremental documents like, where("lastUpdated", ">", "lastExportTimestamp") from the backup function, lastExportTimestamp being the time of last export (and may be stored in a separate collection).
See an example here.
Hope this clarifies, else leave a comment.
P.S. Please be advised that this approach may still need a full periodic backup (say daily), for ease of restore process, if/when required.

Related

Is there a way to limit the size of a collection in firebase firestore?

I am using a collection in Firebase Firestore to log some activities but I don't want this log collection to grow forever. Is there a way to set a limit to the number of documents in a collection or a size limit for the whole collection or get a notification if it passes a limit?
OR is there a way to automatically delete old documents in a collection just by settings and not writing some cron job or scheduled function?
Alternatively, what options are there to create a rotational logging system for client activities in Firebase?
I don't want this log collection to grow forever.
Why not? There are no downsides. In Firestore the performance depends on the number of documents you request and not on the number of documents you search. So it doesn't really matter if you search 10 documents in a collection of 100 documents or in a collection of 100 MIL documents, the response time will always be the same. As you can see, the number of documents within a collection is irrelevant.
Is there a way to set a limit to the number of documents in a collection or a size limit for the whole collection or get a notification if it passes a limit?
There is no built-in mechanism for that. However, you can create one mechanism yourself in a very simple way. Meaning, that you can create a document in which you can increment/decrement a numeric value, each time a document is added or deleted from the collection. Once you hit the limit, you can restrict the addition of documents in that particular collection.
OR is there a way to automatically delete old documents in a collection just by settings and not writing some cron job or scheduled function?
There is also no automatic operation that can help you achieve that. You can either use the solution above and once you hit the limit + 1, you can delete the oldest document. Or you can use a Cloud Function for Firebase to achieve the same thing. I cannot see any reason why you should use a cron job. You can use a Cloud Scheduler to perform some operation at a specific time, but as I understand you want it to happen automatically when you hit the limit.
Alternatively, what options are there to create a rotational logging system for client activities in Firebase?
If you still don't want to have larger collections, maybe you can export the data into a file and add that file to Cloud Storage for Firebase.

What is the best way to schedule tasks in a serverless stack?

I am using NextJS and Firebase for an application. The users are able to rent products for a certain period. After that period, a serverless function should be triggered which updates the database etc. Since NextJS is event-driven I cannot seem to figured out how to schedule a task, which executes when the rental period ends and the database is updated.
Perhaps cron jobs handled elsewhere (Easy Cron etc) are a solution. Or maybe an EC2 instance just for scheduling these tasks.
Since this is marked with AWS EC2, i've assumed it's ok to suggest a solution with AWS services in mind.
What you could do is leverage DynamoDB's speed & sort capabilities. If you specify a table with both the partition key and the range key, the data is automatically sorted in the UTF-8 order. This means iso-timestamp values can be used to sort data historically.
With this in mind, you could design your table to have a partition key of a global, constant value across all users (to group them all) and a sort key of isoDate#userId, while also creating an GSI (Global Secondary Index) with the userId as the partition key, and the isoDate as the range key.
With your data sorted, you can use the BETWEEN query to extract the entries that fit to your time window.
Schedule 1 lambda to run every minute (or so) and extract the entries that are about to expire to notify them about it.
Important note: This sorting method works when ALL range keys have the same size, due to how sorting with the UTF-8 works. You can easily accomplish this if your application uses UUIDs as ids. If not, you can simply generate a random UUID to attach to the isoTimestamp, as you only need it to avoid the rare exact time duplicity.
Example: lets say you want to extract all data from expiring near the 2022-10-10T12:00:00.000Z hour:
your query would be BETWEEN 2022-10-10T11:59:00.000Z#00000000-0000-0000-0000-000000000000 and 2022-10-10T12:00:59.999Z#zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz
Timestamps could be a little off, but you get the idea. 00.. is the start UTF8 of an UUID, and zz.. (or fff..) is the end.
In AWS creating periodic triggers to Lambda using AWS Console is quite simple and straight-forward.
Login to console and navigate to CloudWatch.
Under Events, select Rules & click “Create Rule”
You can either select fixed rate or select Cron Expression for more control
Cron expression in CloudWatch starts from minutes not seconds, important to remember if you are copying Cron expression from somewhere else.
Click “Add Target”, select “Lambda Function” from drop down & then select appropriate Lambda function.
If you want to pass some data to the target function when triggered, you can do so by expanding “Configure Input”

Cloud Function Query Listener Based on Timestamp

I am building an application that must react when the timestamp for a certain Firestore document becomes older than the current time. Is there a way to setup this type of query listener as a Cloud Function, or otherwise achieve the desired goal of reacting to a document when its timestamp crosses the current time?
From what I can tell reading the Firestore and Cloud Function documentation, query listeners may not be possible to setup as Cloud Functions. Furthermore, this is not just a regular query listener - the query criteria (time) is dynamic, so it isn't the typical query structure ("is A < 5") but a dynamic one ("is T < now" where "now" is changing every moment).
If it's true this is not possible as a query listener, I'd certainly appreciate any suggestions on how to achieve this goal through another means. One idea I had was to create a Cloud Function that triggers every 60 seconds and runs the queries based on the time at that moment, but this would not allow constant listening (and 60 seconds is unfortunately too long for our usage). Thank you so much in advance
Firestore queries can only filter on literal values that are explicitly stored in the documents they return. There's no way to perform a calculation in a query, so any time you need a now in the query - that timestamp will be calculated at the moment the query is created.
There are two common ways to implement the time-to-live type functionality that you describe:
Set up a process that periodically runs (e.g. a time-based Cloud Function), and every time the process runs perform a query to determine what documents have expired.
As a variant of this, you could start a permanent listener for updates each time the Cloud Function triggers and keep that active for slightly less than the interval until the next trigger.
Create a Cloud Task for each document that expires/triggers when the document needs to be processed. While this may seem more complex, it actually ends up being simpler due to the fact that your callbacks now trigger on individual documents.
Also see: Is there any TTL (Time To Live ) for Documents in Firebase Firestore, which includes a link to Doug's excellent article on How to schedule a Cloud Function to run in the future with Cloud Tasks (to build a Firestore document TTL).

How to secure data using Firestore's rules.duration & rules.timestamp?

JS / Node.js solution:
How to use Firestore's rules.duration and/or rules.timestamp or other Firestore rules to ensure that a document could be created daily?
Put another way, a user would create, for example a comment/remark/tweet, at most once daily? So how to enforce using Firestore security rules?
For instance, Monday (24 Dec 2018) I could write a new comment. Tuesday (25 Dec 2018) I could write another new comment. But if I were to write the 2nd new comments on Tuesday (25 Dec 2018) it would NOT allow.
The solution should be able to work for daily, weekly, monthly, or quarterly.
Security rules don't have a sense of time, other than the current moment in time that some access occurred, and other timestamps in other documents. So you will have to use timestamps in other documents to gate access.
The only way I can think of to achieve this is in conjunction with Cloud Functions. You could have a single document per user that acts as a write location for new post data. Rules on that document would check that the user is doing two things:
Writing the current time (servervalue timestamp) into a known field.
The current time is also not less than the allowed time since the last write of that field.
When the write is successful, a Cloud Function could trigger on that write, then copy the post data from other fields in that document into the final document where the post must live.
Or you could simplify things a bit, skip the security rules, and just have a Cloud Function that deletes incoming documents that don't satisfy your post frequency rules by querying for the most two recent posts from that user, and checking their timestamps.

Schedule function in firebase

The problem
I have a firebase application in combination with Ionic. I want the user to create a group and define a time, when the group is about to be deleted automatically. My first idea was to create a setTimeout(), save it and override it whenever the user changes the time. But as I have read, setTimeout() is a bad solution when used for long durations (because of the firebase billing service). Later I have heard about Cron, but as far as I have seen, Cron only allows to call functions at a specific time, not relative to a given time (e.g. 1 hour from now). Ideally, the user can define any given time with a datetime picker.
My idea
So my idea is as following:
User defines the date via native datepicker and the hour via some spinner
The client writes the time into a seperate firebase-database with a reference of following form: /scheduledJobs/{date}/{hour}/{groupId}
Every hour, the Cron task will check all the groups at the given location and delete them
If a user plans to change the time, he will just delete the old value in scheduledJobs and create a new one
My question
What is the best way to schedule the automatic deletion of the group? I am not sure if my approach suits well, since querying for the date may create a very flat and long list in my database. Also, my approach is limited in a way, that only full hours can be taken as the time of deletion and not any given time. Additionally I will need two inputs (date + hour) from the user instead of just using a datetime (which also provides me the minutes).
I believe what you're looking for is node schedule. Basically, it allows you to run serverside cron jobs, it has the ability to take date-time objects and schedule the job at that time. Since I'm assuming you're running a server for this, this would allow you to schedule the deletion at whatever time you wish based on the user input.
An alternative to TheCog's answer (which relies on running a node server) is to use Cloud Functions for Firebase in combination with a third party server (e.g. cron-jobs.org) to schedule their execution. See this video for more or this blog post for an alternative trigger.
In either of these approaches I recommend keeping only upcoming triggers in your database. So delete the jobs after you've processed them. That way you know it won't grow forever, but rather will have some sort of fixed size. In fact, you can query it quite efficiently because you know that you only need to read jobs that are scheduled before the next trigger time.
If you're having problems implementing your approach, I recommend sharing the minimum code that reproduces where you're stuck as it will be easier to give concrete help that way.

Resources