I have a Firebase realtime DB i am using to track user analytics. Currently there is about 11 000 users and each of them has quite a bit of entries ( from ten to few hundreds based on how long they interacted with the app ). Json file is 76MBs when i export whole DB.
I am using this data only for analytics, so i will have a look once per day or so on all of the data. Ie i need to download whole DB to get all the data.
When i do that, it takes about 3-5 minutes to actually load the data. I can imagine that if there were ten times more users, it would not be usable then anymore, because of load time.
So i am wondering if these load times are normal and if this is realy bad practice to do such thing? The reason i always download whole DB, is that i want to get overall data, ie how many users is registered and then for example how many ads were watched. To do that, i need to go into each user and see how many ads he watched and count them up. I cant do that without having access to data of all users.
This is first time i am doing something like this on a bit larger scale and those 76MBs are a bit surprising to me as well as the load times to get the data. It seems like its not feasable long term to use this setup.
If you only need this data yourself, consider using the automated backups to get access to the JSON. These backups are made out-of-band, meaning that they (unlike your current process) don't interfere with the handling of other client requests.
Additionally, if you're only using the database for gathering user analytics, consider offloading the data to a database that's more suitable for this purpose. So: use Realtime Database for the user's to send the data to you, but remove it from there to a cheaper/better place after that.
For example, it is quite common to transfer the data to BigQuery, which has much better ad-hoc querying capabilities than Realtime Database.
Related
I am creating an application that uses cloud firestore to store data about "events" in our lab on several assets. We collected data for a few months and we are averaging about 2000 events per asset per month. Each event captures a few pieces of meta data that the user can query.
I imported all the data into firestore with a very simple layout at first.
Events (Collection of event data)
-> EventData (documents which contains a few fields for metadata)
From my understanding, even if the collection of events becomes quite large, for billing and speed of queries this won't be a problem (assuming I do some sort of pagination on the query results). The composite indexes are also very manageable with this structure.
The problem I see, is if someone goes and looks at the firestore console and brings that collection up, our read requests go through the roof. It seems that does a full read on the entire collection...which of course will kill us on billing as time goes on. I don't see this as a problem forever as eventually we should get to the point where everything is stable and won't need to go into the console very often, but what if someone does when we have a million or more records.
My next thought was to structure the database like this:
Events -> Assets -> {Asset_Name } -> {year_month} -> {Collection of
Document with field meta-data}
This certainly solves the issue of the ever growing collection of documents. The number of assets that we have is fixed, and the number of events is (effectively) capped to a maximum amount per month as well. The problem with this setup, however, is managing composite indexes. There are about 5 indexes needed for my original setup. I think this alternative setup means I would need to setup the same 5 indexes for each each collection of documents for every asset every month.
I thought maybe there could be a way to have a cloud function manage it for me (it doesn't appear there is an API for this). I think the number of indexes per project is also capped.
So, in the end, I am looking for recommendations on how to structure this database to limit reads if using the console, as well as keeping the indexes manageable. I am pretty new to NoSQL and perhaps I am just completely off.
I recommend you keep your structure as is if that's what's working for you. You should not need to optimize for reducing console reads. Console reads do count towards your usage but the console does not load the entire collection when you open the console.
The console loads just enough documents to let you scroll a bit and then it loads more documents if you scroll down. It will only load the entire collection if you scroll through the entire collection.
I have events in firebase database table where each event has certain fields. One of the field is event_type. What I want to achieve is to be able to visualize in graphical form, how many events of each type comes daily?
How do I do something like that in firebase database?
Q1. Is it possible to directly do this in firebase?
Q2. Do I need to move data to some other datasource (like Big query) and setup dashboard there?
It is definitely possible to create a dashboard with aggregate data directly on the Firebase Realtime Database. But you'll have to take a different approach than with e.g. BigQuery.
With relational databases, you'll create a dashboard by running aggregation queries. For example to show how many events of each type, you'll run something like SELECT type, COUNT(*) FROM events GROUP BY type.
The Firebase Realtime Database (and most NoSQL databases) don't have such a GROUP BY operation, not a COUNT() method. So that means that you'd have to load all data into your dashboard, and group/count it there, which is quite expensive. That why on NoSQL databases you'll typically keep a running count for each type in the database and update that on every write operation. While this puts an overhead on each write operation, the dashboard itself suddenly becomes very simply when you do this. For an example of a simple counter, see the function-samples repo.
This approach only works if you know up front what counters (and other aggregates) you want to show in the dashboard. If that isn't the case, many developers use the nightly backups from the Realtime Database to ingest the data into another system that lends itself more to exploratory querying, such as BigQuery.
Either approach can work fine. The right approach is a matter of your exact use-case (e.g. do you know the exact data you want in the dashboard, or are you still figuring that out?) and what you're most comfortable with.
I'm currently testing Firebase on a non-production Firebase app which I am the only one who works on.
When I try to query the database to retrieve the data after there has not been any query during the last 24 hours, the query take about 8 seconds. After a query is done, the next ones would take normal amount of time (about 100ms).
This is not about caching the queries, by "next queries" I mean new queries which are not the same.
To reproduce it:
Create a database node called users, users children are user data (first name, last name, age, gender, etc)
Add 500,000 users to this node
Get a user by its UID and measure the time. (It should take about 100ms)
Wait 24 hours (I don't know the exact time, but I'm sure about 24 hours)
Get any user by its UID and measure the time. (It should take about 8sec)
Get any user by its UID and measure the time. (It should take about 100ms)
I want to know if this is a known issue to Firebase realtime database or not?
I reached Firebase support, they were able to recreate the issue and faced a wait time of about 6 seconds. Here is their answer after the investigation:
It looks like this is intended behavior. The realtime database queries work by building the index in-memory, which takes time linear to the number of nodes at that location. Once the index is built things are very fast, but the initial build can take a bit to build, especially for large locations.
If you wants the index to stay in memory on the database you should have a listener always listening for this query.
So basically the database takes a long time to process the query because of indexing the large database.
The problem can be solved by keeping a listener on the database or querying the database every few hours.
In production it is not very likely that you face this problem, because the database is being accessed by the user all the time, but if your database is not accessed all the time and you don't want the users experience that long wait time, you should utilize the discussed solution.
Firebase keeps recently used data in its internal cache. This cache is cleared after a few minutes.
But the exact numbers depend on how much data you're loading and how you're loading that data. Without seeing a specific setup that shows how to reproduce these numbers there really isn't much anyone can say.
The amount of coding that goes into the making of a DataSet is often significant. Now I'm not sure what the industry standard or best practise when dealing with data requests from multiple ASP.NET pages. Should I use a cache/session to pass on the DataSet from page to page or should I fetch directly from the database for each page?
What's the most common approach here?
Here are my thoughts:
It depends on the database and the type of data that you're trying to get, as well as what may modify the data. Do you have backend processes that run concurrent with the data you're going to want? Is this data only updated because of the current page, or does it update at all? How many people are going to use said page?
I personally almost always call to the database, simply because there are so many what-ifs when it comes to this kind of thing. At any time the data can change; it's never as static as people would think it would be. I would personally trade correct data over performance any day.
But that's just me personally. This question is so open ended that it's impossible to take every single thing into consideration since I don't know your database structure, nor how expensive it is to retrieve it, nor what you're using it for.
Sorry I couldn't really be more help.
It depends upon you need. If data size is very large then don't save it in Session or Cahce, because Session or Cache is stored in server Memory. Session is user specific and it will store data for each user in the server, so avoid from it. I think you should directly fetch data each time you need, don't save it in session. If data is very small/limited then you can save it in session ( example UserName or UserId etc ). If you are using a gridview to showdata then use paging and on each page request fetch the data from the database.
In my web application, I have a dynamic query that returns huge data to datatable, and this query is often recalled with different parameters. So database is exhausted.
I want to get all record with no parameters to an object, and perform queries (may be with linq) on this object. So database will not be exthausted.
Which objects can be used instead of datatable?
This is one of my pet peeves - people who return all the data from the database.
There is absolutely no need for this unless you are doing reporting.
If you are doing reporting, then you need to increase your hardware capability so that the database can cope. This may also include tuning your database, rearranging tables, reindexing, regular rebuilding of indexes, updating statistics, archiving out old data, etc.
If you are NOT doing reporting, then start limiting how much data can be queried at any one time. Users DO NOT need to see massive quantities of data all at once. They need to see discrete amounts of data presented in a manageable and coherent way.
Another rule of thumb i like to observe is: let your database server do the work, it is made to manipulate lots of data, it is what it is good at, and it should have the power to do it. Pulling back loads of data to the client, and then trying to manipulate that data on the client is a foolish thing to do. If your client machines are more powerful than the database server then you have issues.
Never ever perform this(except cache)!!!
You are trying to implement DB mechanisms, like
persistent storage
index search and query strategy
replication
and so on
Spend your time on db optimization(optimal scheme, indexes, query, partitioning).