How firebase download data (Usage) is computed? - firebase

I have a firebase project where I am getting a lot of usage in terms of Firebase download data size. Its giving me approx 30GB a month while the data size is very small ~70-80MB in my real time database.
I am wondering how can I figure out if that is really the state of the system used. Is there a way I can see logs, traffic distribution and which nodes are particularly hit in my realtime database.
Is it possible to get a list of calls being to my database which downloads the data ? It may be a case that someone in my team actually is running a script to download data and playing around with it ?
Here is a snapshot for reference:

Related

How to take druid segment data backup?

I am new to druid. In our application we use druid for timeseries data and this can go pretty large(10-20TBs).
Druid provide you facility of deep storage. But if this deep storage crashes/or not reachable then it will result in data loss and which in turn affect the analytics the application is running.
I am thinking of taking an incremental backup druid segment data to some secure location like ftp server. So if deep storage is unavailable, then they can restore the data from this ftp server.
Is there any tool/utility available in druid to incrementally backup/restore druid segment?
In general it's important to take regular snapshots of the metadata storage as this is the "index" of what's in the Deep Storage. Maybe one snapshot per day, and store them for however long you like. It's good to store them for at least a couple of weeks, in case you need to roll back for some reason.
You also need to back up new segments in deep storage when they appear. It isn't important to take consistent snapshots, just to get every file eventually.
Also see https://groups.google.com/g/druid-user/c/itfKT5vaDl8
One other note as you mentioned data loss: Deep Storage is not queried directly - queries execute on the local segment cache in, for example, the Historical process. The Deep Storage is written to at ingestion time, so you might "lose" data that can't be ingested once it's available again, but you will continue to get analytics capability as the already-loaded data is on the historicals... Just a thought haha !
I hope that helps....?!?!

Calculate realm database size

I've implemented Realm database for offline data, but I'm thinking of using the syncing function with a server hosted, for example at Digital Ocean.
But the question is: how to get a good estimate of the size the online database?
The data is just strings and numbers, like a notepad app. I looked at the offline realm file and it's about 2MB large (which feels large. If I just write the data to file as a blob, it's around 50kb). Then it got me thinking, if that't the data size for each user, and I have around 500.000 users, then it's 1TB of data, and that costs too much to afford as a hosting service for a hobby project.
Or can I count around 50kb per user ending up in 10GB?
I don't want to roll out syncing and then I realize that I can't use syncing since I don't have enough of storage space on the server and needs to remove that feature.
You can probably expect such a size when syncing to the server. The server has to keep a log of all changes in the realms too, to be able to do joins and migrations automatically.
Storing the data straight on disk is just a few bytes of characters, the realm file contains the data, including meta data from the RealmObject itself, indexes to search the data and more. So yes the realm file is a lot larger, but it also contains much more information.

Firebase database bandwidth usage growing rapidly even when when the database is not in use

Update: After 9 months of back-and-forth emails (over 40 emails), Google has acknowledged that they have found some bugs that may be responsible for high bandwidth usage, but bandwidth usage is still too high. Resolving this issue does no appear to be a priority for Google/Firebase (it took them 1.5 months to respond to the last email). In light of similar complaints such as: https://news.ycombinator.com/item?id=14356409, and many others, across a wide range of teams/developers, hopefully the situation will improve some day.
I'm just starting a Firebase project and have not accessed the database from any client. I have only created a single tiny test key-value pair in the database (using the console), which uses 23 B of data storage. Surprisingly, the console shows that I have used 215.9 KB (including when I was not touching Firebase at all). This number continues to grow every hour even though I am not using Firebase or even refreshing the data tab in the console!
Here is a screenshot of the console bandwidth usage chart:
Firebase console bandwidth usage screenshot
Others appear to be having the same problem, but there has been no response from Firebase/Google. What's going on? Any help would be greatly appreciated.
The usage chart takes time to update. You may be seeing bandwidth from a few minutes to a few hours ago.
Also, this reminds me of the old Google Analytics referrer issue, the default rules for firebase look something like this:
.read = true;
.write if auth != null;
This means that anyone anywhere can read from your database and that anyone authenticated(even anonymously) can write to it. It is possible since it is a noSQL database with json support that it is probably just crawlers which are the equivalent of Google Analytics referral spam.

Do I need to create a new SQLite database every time an application is updated?

I have a Xamarin Forms application I would like to develop. It will have a SQLite database and I wish to make this available on iOS and Android. The database will be populated with data from a SQL Server database on the cloud with initial seed data. I'm thinking this will be about 500 rows of data with each row about 1Kb.
What I don't understand is when and how to populate this. Should I try to put the data into a CSV file and have this populate the database when the application is installed, or when it first starts? What's the normal way to populate seed data other than lines inside of the code with a huge number of insert statements.
Any help or advice on how this is normally done (I'm thinking most people do it the same way) would be much appreciated.
Thanks
Lets break the problem down.
Is the initial data that you wish to use in your app going to change over time?
If you include any pre-populated data (a SQLite, Realm, or CSV-based file, ...) and the data that you are including goes stale and you have to update it on a routine basis, you will need to publish an application update (.apk/.ipa) so your new user installs receive the updated data (more on this below).
Note: This assumes that your current users get the updated data via actually running your app and it is handling the local data updates on routine basis (background service, push notifications, data polling, etc..)
Is this a Line of Business (LoB) application published via Ad-Hoc, private Store, and/or iOS Enterprise publishing?
If you control the user base, than having to force an update install so your users get your new/updated pre-populated data might be an acceptable approach, but not a great user experience if they forced to update the application all the time... but it works...
Is this application going to be distributed via the public Apple and Google App Stores?
This is where you need to be very careful on what pre-populated data you include within your application.
If the data goes stale and you need to push an updated app version to the Stores for your new install installs, beware that it could be days (or weeks or even month+) to get that new app into the store.
The Play Store usually is less then 24 hours on publishing app updates, and while the Apple Store can be the same, do not bet on it.
We routinely see 48-72 hour delays and randomly get rejected and thus it can take a week or more to get an update app into the Apple Store. We have had rejections delaying an app update for over a month and have gone into the appeal process and even removed already existing features to get re-published
Note: Every app update to the Apple store resets your user reviews... :-(
Bottom line: You want to want to publish to the Stores when you are bug fixing and/or adding features, not to update some "static" data that is stored within your app bundle...
What does this data cost your end-user and you?
Negative costs to you as an app developer are bad reviews and uninstalls. Look at how this "data" effects the end-users access to your application and how they react. Longer download time, usually acceptable. Longer initial app startup times, less acceptable... etc....
What markets will your app be used in? Network speeds and the cost of data transfer in many markets across the world are slow and costly...
What really is the true size of the data?
I "pre-populate" a Realm data instance with thousand of rows with 5MB of JSON data in under a second. SQLite takes longer, but it is still not bad. The data itself is stored in a zip and accessed as a static file (https-based get) and at a 80% compression factor, the 1MB of compressed data is pulled from a server (AWS S3) in under one second using LTE cellular data speeds and uncompressing it as stream while deserializing the JSON on-fly to update the Realm instance adds another second...
So, the user impact is very small and I "hide" this initial pre-populate update via a first-time welcome screen and some text that the user hopefully reads before getting to the first "real" app screen...
Note: This does assume that the user will have network data access the first time they open the app... In many markets around the world, this is not true, so factor this into your app design.
I also architect the app so its data can be update on background threads during its launch (the initial one or not) and thus the user does not stand there watching a spinning busy indictor, they can at least interact with the data that they do have.
So should you include any pre-populated data in your app bundle?
Sure, when that data is absolutely required to get the user up and running as fast as possible to enhance the user experience. Games are a great example of this in bundling 100s of megabytes or even gigabytes via .obb... with the various levels, media files, etc... into the app so the user does not experience a 10+ min. wait time upon opening the app the first time.
Now this does mean that their initial download time for the install was longer as that data was bundled within the app, the overall user experience was better as users accept the download/install times and view that as a carrier/phone/service plan issue vs. the time to open your app the first time to actually get to a functional screen.
So what do?
Personally I look at this issue on a case by case basis. I look at the data and if it is not going to change and only get added to and possibly pruned over time, include it as a pre-populated SQLite or Realm store or... Why cause the user to wait for the web requests, database updates and the additional network data usage and associated costs. If the data is going to go stale, do not bundle it in your app.
As for the mechanics of installing pre-populated data:
See my answer on this SO Question about "Bundle prebuilt Realm files"
You don't have to create your sqlite database every time the app is updated.
Actually SQLiteOpenHelper provides the following two methods:
OnCreate() : you should implement this method and create your sqlite database with populated data from the server. It is called when you the app is started for the first time.
OnUpgrade(): you should implement this method if you want to modify the database (add a new table or column in a table) or populate additional data.
The database is preserved between app updates and you don't need to create it each time.
Check the following examples which explain how to use sqlite database with Xamarin:
Using Sqlite in a Xamarin.Android Application Developed using Visual Studio
and
An Introduction to Xamarin.Forms and SQLite

Firebase: queries on large datasets

I'm using Firebase to store user profiles. I tried to put the minimum amount of data in each user profile (following the good practices advised in the documentation about structuring data) but as I have more than 220K user profiles, it still represents 150MB when downloading as JSON all user profiles.
And of course, it will grow bigger and bigger as I intend to have a lot more users :)
I can't do queries on those user profiles anymore because each time I do that, I reach 100% Database I/O capacity and thus some other requests, performed by users currently using the app, end up with errors.
I understand that when using queries, Firebase need to consider all data in the list and thus read it all from disk. And 150MB of data seems to be too much.
So is there an actual limit before reaching 100% Database I/O capacity? And what is exactly the usefulness of Firebase queries in that case?
If I simply have small amounts of data, I don't really need queries, I could easily download all data. But now that I have a lot of data, I can't use queries anymore, when I need them the most...
The core problem here isn't the query or the size of the data, it's simply the time required to warm the data into memory (i.e. load it from disk) when it's not being frequently queried. It's likely to be only a development issue, as in production this query would likely be a more frequently used asset.
But if the goal is to improve performance on initial load, the only reasonable answer here is to query on less data. 150MB is significant. Try copying a 150MB file between computers over a wireless network and you'll have some idea what it's like to send it over the internet, or to load it into memory from a file server.
A lot here depends on the use case, which you haven't included.
Assuming you have fairly standard search criteria (e.g. you search on email addresses), you can use indices to store email addresses separately to reduce the data set for your query.
/search_by_email/$user_id/<email address>
Now, rather than 50k per record, you have only the bytes to store the email address per records--a much smaller payload to warm into memory.
Assuming you're looking for robust search capabilities, the best answer is to use a real search engine. For example, enable private backups and export to BigQuery, or go with ElasticSearch (see Flashlight for an example).

Resources