Firebase Realtime Database Cache behaviour on Flutter - firebase

I'm trying to understand how Firebase Realtime Database uses cache. The documentation doesn't clarify some cases about cache handling. Especially for Flutter, there is no documentation and online sources are not enough. There are two different scenarios that I'm confused.
First of all, I start with setting the cache for both scenarios:
await FirebaseDatabase.instance.setPersistenceEnabled(true);
await FirebaseDatabase.instance.setPersistenceCacheSizeBytes(10000000);
Scenario 1: I listen to the value of a specific user. I want to donwload user data for once. Then, always use cache and download only updates if there is any:
final stream = FirebaseDatabase().reference().child("users").child("some_id").onValue();
It's my understanding that Firebase will download the node first and use the cache later if there is no update. This won't change even if the app restarts.
Scenario 2: I want to query the posts that are created only after the date:
final date = DateTime(2020,6,20);
final data = await FirebaseDatabase().reference().child("posts").orderByChild("createdAt").startAt(date).once();
Here for Scenario 2, I'm not sure how cache will be done. If Firebase Realtime Database caches the query, will it download everything when a new post created after the date? Or it will download only the new post and get others from the cache?

If there is a change to a location/query that you have a listener on, Firebase performs a so-called delta-sync on that data. In this delta-sync, the client calculates hashes on subtrees of its internal version of the data, and sends those to the server. The server compares those hashes with those of its own subtrees and only sends back the subtrees where the hashes are different. This is usually quite a bit smaller than the full data, but not necessarily the minimal delta.
Note that Firebase will always perform a delta sync between the data it has in memory already for the query/location and the data on the server, regardless of whether you enable disk persistence. Having disk persistence enabled just means the in-memory copy will initially be populated from disk, but after that the delta-sync works the same for both cases.

Related

Backing up Firestore data incrementally

I'm trying to think of the best (read automated, cheapest and easy to use) way to back up Firestore data for a production app.
I'm aware I could automate exports through a scheduled cloud function and send them over to a gcloud bucket. The problem I have with this approach is that it does not allow for "incremental updates of the new and updated documents" but only for backing up entire collections. This means that most of the data will be backed up each and every time, even though it hasn't even changed since the last backup, skyrocketing the cost up for no reason.
The approach that came to mind was having a cloud function in "my-app" project that would listen to each and every change in the Firestore, and perform the same change in the Firestore of the "my-app-backup" project.
This way, I only back up the changed data. Furthermore, backed up data would never become stale (as it's backed up in real-time), unlike the first approach where automated backups happen e.g. daily or weekly.
Is this even possible, having a single cloud function in the first Firebase project writing data into another Firebase project? If not, perhaps write the data elsewhere(not in another Firebase project)? Does the approach even make sense, or do you have a better suggestion?
If you want to export updated documents only then you can store a field updatedAt and query documents where("updatedAt", ">", "lastExportTime"). Then you can periodically run a Cloud function to export these documents. This should only cost N reads (N = number of updated documents) every time the function runs.
Furthermore, backed up data would never become stale (as it's backed up in real-time)
This works too but can also get expensive if the document updates are too frequent.

Firebase Firestore - delay between data is written and available for a query

I have noticed a delay between the time some data is written to Firestore and when it is available to be read via a query. In my tests I see this can go up to 30sec.
Is this documented anywhere?
Are there ways to decrease this delay?
Is there a way to know the server timestamp corresponding to the data being returned? Or to have any indication about this delay in the data being returned from Firestore?
(say some data is written to the server at 1:00 - the document is created server-side at that time, I query it at 1:01, but due to the delay it returns the data as it was at 0:58 server-side, the timestamp would be 0:58)
Here I am not speaking about anything with high load, my tests were just about writing and then reading 1 small document.
Firestore will have some delays, even I have noticed it. It's better to use the Realtime Database as it lives up to the name, the time lag is minimal, less than a second in most cases !
If you use Firestore on a native device offline first is enabled by default. That means data is first written to a cache on your device and then synced to the backend. On Web you can enable that to. To listen for those changes when a write is saved to the backend you would need to enable includeMetadataChanges on your realtime listeners:
db.collection("cities").doc("SF")
.onSnapshot({
// Listen for document metadata changes
includeMetadataChanges: true
}, (doc) => {
// ...
});
That way you will get a notice when the data is written to the backend. You can read more about it here.
The delay should be only between different devices. If you listen to the changes on your device where you write the data it will be shown immidiately.
Also something you should take care of. If you have offline enabled you should not await for the writes to finish. Just use them as if they are synchronous (even on web with enabled offline feature). I had that before in my code and it looked like the database was slow. By removing the await the code was running much faster.

Firebase real time database transaction while offline

I am using react-native-firebase package in a react native application and am trying to understand how transactions work in offline. I am trying to write a transaction using the following code
firebase.database().ref('locations').transaction(locations => {
... my location modification logic,
return locations
})
However, if I go offline before writing the transaction and have not accessed the reference previously and therefore have no cached data, locations is null.
There is this small tidbit in Firebase's official documentation
Note: Because your update function is called multiple times, it must
be able to handle null data. Even if there is existing data in your
remote database, it may not be locally cached when the transaction
function is run, resulting in null for the initial value.
Which leads me to believe I should wrap the entire transaction logic inside
if (locations) {
... my location modification logic
}
But I still don't fully understand this. Is the following assumption correct?
Submit transaction
If offline and cached data exists, apply transaction against cached data, then apply towards current data in remote when connectivity resumes
If offline and no cached data exists, do not apply transaction. Once connectivity resumes, apply transaction to current data in remote
If online, immediately apply transaction
If these assumptions are correct, then the user will not immediately see their change in case #3, but in case #2 it will 'optimistically' update their cached data and the user will feel like their action immediately took place. Is this how offline transactions work? What am I missing?
Firebase Realtime Database (and Firestore) don't support offline transactions at all. This is because a transaction must absolutely round trip with the server at least once in order to safely commit the changes to the data, while also avoiding collisions with other clients that could be trying to change the same data.
If you're wondering why the SDK doesn't just persist the callback that handles the transaction, all that can be said is that persisting an instance of an object (and all of its dependent state, such as the values of all variables in scope) is actually very difficult, and is not even possible in all environments. So, you can expect that transaction only work while the client app is online and able to communicate with the server.

How can I keep objects stored in firebase cloud function RAM?

My application needs to build a couple of large hashmaps before processing a user's request. Ideally I want to store these hashmaps in-memory on the machine, which means it never has to do any expensive processing and can process any incoming requests quickly.
But this doesn't work for firebase because there's a chance a user triggers a new instance which sets off the very time-consuming preprocessing step.
So, I tried designing my application to use the firebase database, and get only the data it needs from the database each time instead of holding all the data in-memory. But, since the cloud functions are downloading loads of data from the database, I have now triggered over 1.7 GB in download for this month, just by myself from testing. This goes over the quota.
There must be something I'm missing; all I want is a permanent memory storage of some hashmaps. All I want is for those hashmaps to be ready by the time the function is called with a request. It seems like such a simple requirement; how come there is no way to do this?
If you want to store data in the container that runs your Cloud Functions, you can use its local tempfs, which is actually kept in memory. But this will disappear when the container is recycled, which happens when your function hasn't been access for a while. So this local file system will have to be rebuilt whenever the container spins up.
If you want permanent storage of values you generate, consider using Google Cloud Storage. It is probably a more cost effective option, and definitively the most scalable one.

Does Firebase guarantee that data set using updateValues or setValue is available in the backend as one atomic unit?

We have an application that uses base64 encoded content to transmit attachments to backend. Backend then moves the content to Storage after some manipulation. This way we can enjoy world class offline support and sync and at the same time use the much cheaper Storage to store the files in the end.
Initially we used updateChildren to set the content in one go. This works fairly well, but then users started to upload bigger and more files at the same time, resulting in silent freezing of the database in the end user devices.
We then changed the code to write the files one by one using FirebaseDatabase.getInstance().getReference("/full/uri").setValue(base64stuff), and then using updateChildren to only set the metadata.
This allowed seemingly endless amount of files (provided that it is chopped to max 9 meg chunks), but now we're facing another problem.
Our backend uses Firebase listener to start working once new content is available. The trigger waits for the metadata and then starts to process the attachments. It seems that even though the client device writes the files before we set the metadata, the backend usually receives the metadata before the content from the files is available. This forced us to change backend code to stop processing and check later again if the attachment base64 data is available.
This works, but is not elegant and wastes cpu cycles and increases latencies.
I haven't found anything in the docs wether Firebase guarantees anything about the order in which the data is received by the backend. It seems that everything written in one go (using setValue or updateChildren) is available in the backend as one atomic unit.
Is this correct? Can I depend on that as a fact that will not change in the future?
The way I'm going to go about this (if the assumptions are correct above) is to write metadata first using updateChildren in the client like this
"/uri/of/metadata/uid/attachments/attachment_uid1" = "per attachment metadata"
"/uri/of/metadata/uid/attachments/attachment_uid2" = "per attachment metadata"
and then each base64 chunk using updateChildren with following payload:
"/uri/of/metadata/uid/uploaded_attachments/attachment_uid2" = true
"/uri/of/base64/content/attachment_uid" = "base64content"
I can't use setValue for any data to prevent accidental overwrite depending the order in which the writes will happen in the end.
This would allow me to listen to /uri/of/base64/content and try to start the handling of the metadata package every time a new attachment completes the load. The only thing needed to determine if all files have been already uploaded is to grab the metadata and see that all attachment uids found from /attachments/ are also present /uploaded_attachments/.
Writes from a single Firebase Database client are delivered to the server in the same order as they are executed on the client. They are also broadcast out to any listening clients in the same order.
There is no chance that another client will see the results of write B without seeing the results from write A (unless A was rejected by security rules)

Resources