How to handle offline aggregation using Firestore? - firebase

I have been scouring the internet for days on a solution to this problem.
That is, how to handle aggregation when there is no network connection? I have a task management app that looks to aggregate meta data about user tasks. For example, the task can contain tags that can be aggregated to be shown in a dashboard to the user on a daily basis. This would be easy if the user is always online, so I could use transaction or cloud function to aggregate, but when the user is offline, the aggregation will appear to be incorrect, until the user restores their network connection.
Aggregation queries are explained here:
https://firebase.google.com/docs/firestore/solutions/aggregation
Which states a limitation:
Offline support - Client-side transactions will fail when the user's
device is offline, which means you need to handle this case in your
app and retry at the appropriate time.
However, there has yet to be any example or documentation on how to 'handle this case'. How would I go about addressing this problem?
Some thoughts:
I could cache the item if a transaction fails. This item will be aggregated on top of the stored aggregation. However, going down this line would mean that I can't take advantage of the Firestore's "offline mode", because I'm using my own cache on every write while offline anyway.
I could aggregate on demand. That is, never store the aggregation. This is going to be very heavy on read depending on how many tasks a user has. Furthermore, if the aggregation will need to be shared as insights to other users, this option will not work because other users do not have access to the tasks.
I'm at a loss and any help would be appreciated, thanks!

After a lot of research and trial and error I found a solution that can address this problem gracefully.
FieldValue.increment to the rescue.
What FieldValue.increment does is bypass the use of transaction while respecting the default Firestore's offline cache behaviour. It requires the use of set or update on the field directly. The drawback is the inability to use the 'withConverter' on the collection for type safety. I'm willing to live with the drawback considering how useful FieldValue.increment is.
I've done multiple tests and can confirm that the values can be incremented/decremented multiple times locally while offline. This offline value is reflected in a get or snapshot call to the cache. When the network connection is restored, the values are updated on the server.
The value itself is not stored on the cache, it simply stores the "difference" in the FieldValue sentinel for when it is time to update it on the server.
This method only works with incrementing and decrementing values. Storing averages will not be possible using this method. That is because the true total number of items is not known at the time of its calculation when offline.
Instead, the total number of items are stored along side the total value. The average is then calculated when and as needed. In this way the average will always be accurate from a local perspective when offline, and it will also be accurate when online when the total value and count has been synced.

Related

Update Firestore with user last active date

I'm looking at writing the date a user was last active to my firestore users table. This information is available in the metadata of the user - lastRefreshTime.
https://firebase.google.com/docs/reference/admin/node/admin.auth.UserMetadata
Has anyone already done this before?
I am looking for an efficient way to do this with minimal writes.
I could run a daily process that checks all users and the dates and updates if changed but wondering if there is a better more efficient way.
How about having each client write it themselves when they go online?
It won't be guaranteed (as malicious users may call the API themselves without writing the value), but it will prevent you from having to have an administrative process over a data set that will be hard to predict the growth of.

Firestore Document "Too much contention": such thing in realtime database?

I've built an app that let people sell tickets for events. Whenever a ticket is sold, I update the document that represents the ticket of the event in firestore to update the stats.
On peak times, this document is updated quite a lot (10x a second maybe). Sometimes transactions to this item document fail due to the fact that there is "too much contention", which results in inaccurate stats since the stat update is dropped. I guess this is the result of the high load on the document.
To resolve this problem, I am considering to move the stats of the items from the item document in firestore to the realtime database. Before I do, I want to be sure that this will actually resolve the problem I had with the contention on my item document. Can the realtime database handle such load better than a firestore document? Is it considered good practice to move such data to the realtime database?
The issue you're running into is a documented limit of Firestore. There is a limit to the rate of sustained writes to a single document of 1 per second. You might be able to burst writes faster than that for a while, but eventually the writes will fail, as you're seeing.
Realtime Database has different documented limits. It's measured in the total volume of data written to the entire database. That limit is 64MB per minute. If you want to move to Realtime Database, as long as you are under that limit, you should be OK.
If you are effectively implementing a counter or some other data aggregation in Firestore, you should also look into the distributed counter solution that works around the per-document write limit by sharding data across multiple documents. Your client code would then have to use all of these document shards in order to present data.
As for whether or not any one of these is a "good practice", that's a matter of opinion, which is off topic for Stack Overflow. Do whatever works for your use case. I've heard of people successfully using either one.
On peak times, this document is updated quite a lot (10x a second maybe). Sometimes transactions to this item document fail due to the fact that there is "too much contention"
This is happening because Firestore cannot handle such a rate. According to the official documentation regarding quotas for writes and transactions:
Maximum write rate to a document: 1 per second
Sometimes it might work for two or even three writes per second but at some time will definitely fail. 10 writes per second are way too much.
To resolve this problem, I am considering to move the stats of the items from the item document in Firestore to the realtime database.
That's a solution that I even I use it for such cases.
According to the official documentation regarding usage and limits in Firebase Realtime database, there is no such limitation there. But it's up to you to decide if it fits your needs or not.
There one more thing that you need to into consideration, which is distributed counter. It can solve your problem for sure.

Looking for an efficient way to update the data

I'm writing a small game for Android in Unity. Basically the person have to guess whats on the photo. Now my boss wants me to add an additional function-> after successful/unsuccessful guess the player will get the panel to rate the photo (basically like or dislike), because we want to track which photos are not good/remove the photos after a couple of successful guesses.
My understanding is that if we want to add +1 to the variable in Firebase first I have to make the call and get it then we have to make a separate call with adding 1 to the value we got. I was wandering if there is a more efficient way to do it?
Thanks for any suggestions!
Instead of requesting firebase when you want to add ,you can request firebase in the beginning (onCreate like method) and save the object and then use it when you want to update it.
thanks
Well, one thing you can do is to store your data temporarily in some object, but NOT send it to Firebase right away. Instead, you can send the data to Firebase in times when the app/game is about to get paused/minimized; hence, reducing potential lags and increasing player satisfaction. OnApplicationPause(bool) is one of such functions that gets called when the game is minimized.
To do what you want, I would recommend using a Transaction instead of just doing a SetValueAsync. This lets you change values in your large shared database atomically, by first running your transaction against the local cache and later against the server data if it differs (see this question/answer).
This gets into some larger interesting bits of the Firebase Unity plugin. Reads/writes will run against your local cache, so you can do things like attach a listener to the "likes" node of a picture. As your cache syncs online and your transaction runs, this callback will be asynchronously triggered letting you keep the value up to date without worrying about syncing during app launch/shutdown/doing your own caching logic. This also means that generally, you don't have to worry too much about your online/offline state throughout your game.

Aggregation on FireStore/CloudDatastore. Use Cloud Functions onCreate/Update?

I want to create an expense tracker and one of the things I want to find out is how much did I spend in each month per category.
How should I do this in FireStore/DataStore?
Pull down required data and do aggregation locally? Seems very slow?
Perform aggregation everytime a transaction is created/updated and save it in a table? But this may result in many invocations of the functions, which may be costly?
Is there a better way? Seems like 2 is currently the best option? But I wonder if theres anyway I can reduce costs?
I note that I may not need the aggregated data to be realtime, so is there a way to debounce the cloud function execution? Since I note that at times, I will batch insert a bunch of transactions. Wonder if theres a way to disable functions for certain queries and manually call them after the batch has finished for example?
The two approaches you describe are indeed the most common.
The best approach mostly depends on the number of transactions you have. If you have few transactions, then it may be totally fine to do the aggregation on each client. But as you get more transactions, the overhead of downloading the data will become prohibitive and you're more likely to want to keep a running total in the database.
I'd normally recommend keeping the total up to date with any transaction. You can even do that with client-side code, by using transactions (to prevent multiple users overwriting each other's updates) and server-side security rules (to prevent malicious actors from writing an aggregate that doesn't match its transaction).
If you want to aggregate in batches, you'll want to run code periodically, either in a server you control, or in Cloud Functions.
There is nothing built into Cloud Functions to debounce document writes. You could probably keep a debounce counter in Firestore, but that would then be reading/writing a document on each transaction.
More reasonable seems to run a function on a timer, as described in this blog post and shown in this video. But you'll need to make sure your data structure in that case allows the code to detect what transactions it needs to aggregate.
One way to do this is to ensure the transactions can be ordered in some way, e.g. by giving them a timestamp, and having your aggregation code keep track (likely in the database) of the last timestamp it has aggregated already. Then whenever the aggregator runs, it:
reads the current aggregated value
queries the database for transactions that have been added since it last ran
loops over those transactions, updating the aggregated value
writes the aggregated value and the last timestamp back to the database in a transaction (to ensure either both are written, or neither is written)

Firebase database high delay after a long standby

I'm currently testing Firebase on a non-production Firebase app which I am the only one who works on.
When I try to query the database to retrieve the data after there has not been any query during the last 24 hours, the query take about 8 seconds. After a query is done, the next ones would take normal amount of time (about 100ms).
This is not about caching the queries, by "next queries" I mean new queries which are not the same.
To reproduce it:
Create a database node called users, users children are user data (first name, last name, age, gender, etc)
Add 500,000 users to this node
Get a user by its UID and measure the time. (It should take about 100ms)
Wait 24 hours (I don't know the exact time, but I'm sure about 24 hours)
Get any user by its UID and measure the time. (It should take about 8sec)
Get any user by its UID and measure the time. (It should take about 100ms)
I want to know if this is a known issue to Firebase realtime database or not?
I reached Firebase support, they were able to recreate the issue and faced a wait time of about 6 seconds. Here is their answer after the investigation:
It looks like this is intended behavior. The realtime database queries work by building the index in-memory, which takes time linear to the number of nodes at that location. Once the index is built things are very fast, but the initial build can take a bit to build, especially for large locations.
If you wants the index to stay in memory on the database you should have a listener always listening for this query.
So basically the database takes a long time to process the query because of indexing the large database.
The problem can be solved by keeping a listener on the database or querying the database every few hours.
In production it is not very likely that you face this problem, because the database is being accessed by the user all the time, but if your database is not accessed all the time and you don't want the users experience that long wait time, you should utilize the discussed solution.
Firebase keeps recently used data in its internal cache. This cache is cleared after a few minutes.
But the exact numbers depend on how much data you're loading and how you're loading that data. Without seeing a specific setup that shows how to reproduce these numbers there really isn't much anyone can say.

Resources