Firebase database high delay after a long standby - firebase

I'm currently testing Firebase on a non-production Firebase app which I am the only one who works on.
When I try to query the database to retrieve the data after there has not been any query during the last 24 hours, the query take about 8 seconds. After a query is done, the next ones would take normal amount of time (about 100ms).
This is not about caching the queries, by "next queries" I mean new queries which are not the same.
To reproduce it:
Create a database node called users, users children are user data (first name, last name, age, gender, etc)
Add 500,000 users to this node
Get a user by its UID and measure the time. (It should take about 100ms)
Wait 24 hours (I don't know the exact time, but I'm sure about 24 hours)
Get any user by its UID and measure the time. (It should take about 8sec)
Get any user by its UID and measure the time. (It should take about 100ms)
I want to know if this is a known issue to Firebase realtime database or not?

I reached Firebase support, they were able to recreate the issue and faced a wait time of about 6 seconds. Here is their answer after the investigation:
It looks like this is intended behavior. The realtime database queries work by building the index in-memory, which takes time linear to the number of nodes at that location. Once the index is built things are very fast, but the initial build can take a bit to build, especially for large locations.
If you wants the index to stay in memory on the database you should have a listener always listening for this query.
So basically the database takes a long time to process the query because of indexing the large database.
The problem can be solved by keeping a listener on the database or querying the database every few hours.
In production it is not very likely that you face this problem, because the database is being accessed by the user all the time, but if your database is not accessed all the time and you don't want the users experience that long wait time, you should utilize the discussed solution.

Firebase keeps recently used data in its internal cache. This cache is cleared after a few minutes.
But the exact numbers depend on how much data you're loading and how you're loading that data. Without seeing a specific setup that shows how to reproduce these numbers there really isn't much anyone can say.

Related

How to handle offline aggregation using Firestore?

I have been scouring the internet for days on a solution to this problem.
That is, how to handle aggregation when there is no network connection? I have a task management app that looks to aggregate meta data about user tasks. For example, the task can contain tags that can be aggregated to be shown in a dashboard to the user on a daily basis. This would be easy if the user is always online, so I could use transaction or cloud function to aggregate, but when the user is offline, the aggregation will appear to be incorrect, until the user restores their network connection.
Aggregation queries are explained here:
https://firebase.google.com/docs/firestore/solutions/aggregation
Which states a limitation:
Offline support - Client-side transactions will fail when the user's
device is offline, which means you need to handle this case in your
app and retry at the appropriate time.
However, there has yet to be any example or documentation on how to 'handle this case'. How would I go about addressing this problem?
Some thoughts:
I could cache the item if a transaction fails. This item will be aggregated on top of the stored aggregation. However, going down this line would mean that I can't take advantage of the Firestore's "offline mode", because I'm using my own cache on every write while offline anyway.
I could aggregate on demand. That is, never store the aggregation. This is going to be very heavy on read depending on how many tasks a user has. Furthermore, if the aggregation will need to be shared as insights to other users, this option will not work because other users do not have access to the tasks.
I'm at a loss and any help would be appreciated, thanks!
After a lot of research and trial and error I found a solution that can address this problem gracefully.
FieldValue.increment to the rescue.
What FieldValue.increment does is bypass the use of transaction while respecting the default Firestore's offline cache behaviour. It requires the use of set or update on the field directly. The drawback is the inability to use the 'withConverter' on the collection for type safety. I'm willing to live with the drawback considering how useful FieldValue.increment is.
I've done multiple tests and can confirm that the values can be incremented/decremented multiple times locally while offline. This offline value is reflected in a get or snapshot call to the cache. When the network connection is restored, the values are updated on the server.
The value itself is not stored on the cache, it simply stores the "difference" in the FieldValue sentinel for when it is time to update it on the server.
This method only works with incrementing and decrementing values. Storing averages will not be possible using this method. That is because the true total number of items is not known at the time of its calculation when offline.
Instead, the total number of items are stored along side the total value. The average is then calculated when and as needed. In this way the average will always be accurate from a local perspective when offline, and it will also be accurate when online when the total value and count has been synced.

Firebase DB slow download

I have a Firebase realtime DB i am using to track user analytics. Currently there is about 11 000 users and each of them has quite a bit of entries ( from ten to few hundreds based on how long they interacted with the app ). Json file is 76MBs when i export whole DB.
I am using this data only for analytics, so i will have a look once per day or so on all of the data. Ie i need to download whole DB to get all the data.
When i do that, it takes about 3-5 minutes to actually load the data. I can imagine that if there were ten times more users, it would not be usable then anymore, because of load time.
So i am wondering if these load times are normal and if this is realy bad practice to do such thing? The reason i always download whole DB, is that i want to get overall data, ie how many users is registered and then for example how many ads were watched. To do that, i need to go into each user and see how many ads he watched and count them up. I cant do that without having access to data of all users.
This is first time i am doing something like this on a bit larger scale and those 76MBs are a bit surprising to me as well as the load times to get the data. It seems like its not feasable long term to use this setup.
If you only need this data yourself, consider using the automated backups to get access to the JSON. These backups are made out-of-band, meaning that they (unlike your current process) don't interfere with the handling of other client requests.
Additionally, if you're only using the database for gathering user analytics, consider offloading the data to a database that's more suitable for this purpose. So: use Realtime Database for the user's to send the data to you, but remove it from there to a cheaper/better place after that.
For example, it is quite common to transfer the data to BigQuery, which has much better ad-hoc querying capabilities than Realtime Database.

Firestore Document "Too much contention": such thing in realtime database?

I've built an app that let people sell tickets for events. Whenever a ticket is sold, I update the document that represents the ticket of the event in firestore to update the stats.
On peak times, this document is updated quite a lot (10x a second maybe). Sometimes transactions to this item document fail due to the fact that there is "too much contention", which results in inaccurate stats since the stat update is dropped. I guess this is the result of the high load on the document.
To resolve this problem, I am considering to move the stats of the items from the item document in firestore to the realtime database. Before I do, I want to be sure that this will actually resolve the problem I had with the contention on my item document. Can the realtime database handle such load better than a firestore document? Is it considered good practice to move such data to the realtime database?
The issue you're running into is a documented limit of Firestore. There is a limit to the rate of sustained writes to a single document of 1 per second. You might be able to burst writes faster than that for a while, but eventually the writes will fail, as you're seeing.
Realtime Database has different documented limits. It's measured in the total volume of data written to the entire database. That limit is 64MB per minute. If you want to move to Realtime Database, as long as you are under that limit, you should be OK.
If you are effectively implementing a counter or some other data aggregation in Firestore, you should also look into the distributed counter solution that works around the per-document write limit by sharding data across multiple documents. Your client code would then have to use all of these document shards in order to present data.
As for whether or not any one of these is a "good practice", that's a matter of opinion, which is off topic for Stack Overflow. Do whatever works for your use case. I've heard of people successfully using either one.
On peak times, this document is updated quite a lot (10x a second maybe). Sometimes transactions to this item document fail due to the fact that there is "too much contention"
This is happening because Firestore cannot handle such a rate. According to the official documentation regarding quotas for writes and transactions:
Maximum write rate to a document: 1 per second
Sometimes it might work for two or even three writes per second but at some time will definitely fail. 10 writes per second are way too much.
To resolve this problem, I am considering to move the stats of the items from the item document in Firestore to the realtime database.
That's a solution that I even I use it for such cases.
According to the official documentation regarding usage and limits in Firebase Realtime database, there is no such limitation there. But it's up to you to decide if it fits your needs or not.
There one more thing that you need to into consideration, which is distributed counter. It can solve your problem for sure.

Cloud Firestore Data Structure

I am creating an application that uses cloud firestore to store data about "events" in our lab on several assets. We collected data for a few months and we are averaging about 2000 events per asset per month. Each event captures a few pieces of meta data that the user can query.
I imported all the data into firestore with a very simple layout at first.
Events (Collection of event data)
-> EventData (documents which contains a few fields for metadata)
From my understanding, even if the collection of events becomes quite large, for billing and speed of queries this won't be a problem (assuming I do some sort of pagination on the query results). The composite indexes are also very manageable with this structure.
The problem I see, is if someone goes and looks at the firestore console and brings that collection up, our read requests go through the roof. It seems that does a full read on the entire collection...which of course will kill us on billing as time goes on. I don't see this as a problem forever as eventually we should get to the point where everything is stable and won't need to go into the console very often, but what if someone does when we have a million or more records.
My next thought was to structure the database like this:
Events -> Assets -> {Asset_Name } -> {year_month} -> {Collection of
Document with field meta-data}
This certainly solves the issue of the ever growing collection of documents. The number of assets that we have is fixed, and the number of events is (effectively) capped to a maximum amount per month as well. The problem with this setup, however, is managing composite indexes. There are about 5 indexes needed for my original setup. I think this alternative setup means I would need to setup the same 5 indexes for each each collection of documents for every asset every month.
I thought maybe there could be a way to have a cloud function manage it for me (it doesn't appear there is an API for this). I think the number of indexes per project is also capped.
So, in the end, I am looking for recommendations on how to structure this database to limit reads if using the console, as well as keeping the indexes manageable. I am pretty new to NoSQL and perhaps I am just completely off.
I recommend you keep your structure as is if that's what's working for you. You should not need to optimize for reducing console reads. Console reads do count towards your usage but the console does not load the entire collection when you open the console.
The console loads just enough documents to let you scroll a bit and then it loads more documents if you scroll down. It will only load the entire collection if you scroll through the entire collection.

Is there anyway to monitor one single (class) of object in terms of cache?

I am trying to determine which implementation of the data structure would be best for the web application. Basically, I will maintain one set of "State" class for each unique user, and the State will be cached for some time when the user login, and after the non-sliding period, the state is saved to the db. So in order to balance the db load and the iis memory, I have to determine what is the best (expected) timeout for the cache.
My question is, how to monitor the particular cache activity for one set of object? I tried perfmon, and it gives roughly the % of total memory limit, but no idea on size or so (maybe even better, I could get a list of all cached objects and also the size and other performance issue data).
One last thing, I expect the program is going to handle 100,000+ cached user and each of them may do a request in about 10s-60s. So performance does matters to me.
What exactely are you trying to measure here? If you just want to get the size of your in-memory State instances at any given time, you can use an application-level counter and add/substract every time you create/remove an instance of State. So you know your State size, you know how many State instances you have. But if you already count on getting 100.000+ users each requesting at least once / minute you can actually do the math.

Resources