Understand where the bandwidth usage is coming from in Firebase database - firebase

My app is growing in terms of bandwidth usage with Firebase database and I am trying to optimize my queries to use less bandwidth (thus reduce cost) but I am doing this quite blindly because there are no statistics about my database usage (I can't know what queries take the most bandwidth).
Is there somehow a way to know which queries are taking a lot of bandwidth? How do you go about optimizing usage with Firebase database?
Edit:
I have a chat website, and I use observers such as messagesRef.child(conversationID).limitToLast(25).on('child_‌​added'...
conversationsRef.child(conversationID).('participants').on('value'...

The Firebase Profiler saved my life for this. https://firebase.google.com/docs/database/usage/profile
Was able to pinpoint exactly what reference (including children) was hogging the bandwidth, which made it much easier to figure out which part of the code is problematic.

there's no query tuning tools, if that's what you're looking for. you could build in simple time logging to capture just before and after queries are issued, log that data, and harvest it from client to narrow down to the most poorly performing ones.
hard to help without seeing the actual queries or the data model.

Just in case you are not already using Firebase's .indexOn()... which is the best way to improve your performance (so they say hereunder), take a look at Index Your Data.
Firebase guys say:
If you know in advance what your indexes will be, you can define them via the .indexOn rule in your Firebase Realtime Database Rules to improve query performance.

highly agreed with ZagNut answer.
Logging queries completion on "then()" will help you here.
You can keep count of queries for a node request on client side and save that request count by client id in a separate node from your data structure on firebase database.
Now filter these queries to find usage patterns.
Thanks.

Related

Plain JSON vs in-app DB for React-native?

I have 50-100mb dataset that users need to have access to. It's static, so doesn't make sense to host a server for it. There are two kinds of operations I'll perform on the data:
Reading objects by unique ObjectId. Each object is ~3kb.
Full text search through ~300.000 strings. Each string is 4-60 characters.
I'm considering to store data as JSON files. The 300k strings will be stored separately. I'll use https://github.com/nextapps-de/flexsearch or something similar to perform search over it. I've done something similar before with ~10mb dataset back in 2016. I used just regex search and it was working flawlessly.
Are there reasons to use RealmDB, SQLite, PouchDB or something else instead of just JSON?
I wish I did this question an year ago...
In the office I currently work we tried creating an app by using PouchDB and react native, we basically saw PouchDB as an advantage because it wouldn't require our API to send all data over and over again on every refresh triggered by the user, it would only send the data that changed based on the client's checkpoint. As the data in the server was quite heavy (around 6k entries with more than 200 attributes each) we tried at all costs to go easy on the client's data plan.
Months after this implementation was in place we implemented a search functionality with many different options for sorting and filtering, and not only we had to throw away all our implementation of PouchDB, but we had to start from scratch replacing all its logic with indexed JSON values. PouchDB performance was extremely slow, it was taking more than 5 seconds or so to retrieve results, and we just couldn't afford to delay this time on our scope.
In the end we accomplished to reach a very quick search running flex search inside our indexed JSONs. Don't do the same mistake we did, PouchDB costed us too much budget and precious time. It was a terrible choice.
Unfortunately I cannot offer proof or more details from a reputable source, I can only share the own personal terrible experience I had when I thought we were reaching the end of a project and we had to start from scratch. it was a mess.
Oh boy, a bountied, opinion based question!
I have about 5 years experience with pouchDB specifically, a little with SQLite. I have but a cursory experience with RealmDB - I tried it out and decided it was not a good fit for my hybrid/mobile needs.
pouchDB exceeds in on one area hands down - synchronization/replication just like it's big brother CouchDB. Providing interaction with an offline database that synchronizes with a remote database is huge for many mobile apps. pouchDB is schemaless, leveraging JSON documents. With pouchDB one may choose among several data stores via adapters. As there can be quota headaches1 for your data size the right choice may likely be the SQLite adapter. pouchDB does not support full text search.
SQLite is what its name implies - a relational database, requiring a schema. An advantage to SQLite is platform support and the size of the database is not subject to quota headaches like web storage (e.g. IndexedDB). SQLite supports full text search, and apps can deploy with a canned database.
Between pouchDB and SQLite lies RealmDB - it is a schema based object database that supports synchronization/replication. Like pouchDB, it does not support full text search.
Now your requirements
Looking up object by id
300k static text
full-text search
I read 'static' to mean immutable.
Since your data does not change and full-text search is required, pouchDB and RealmDB would not be good choices. If there is a requirement to enhance, remove or add to the data, either would make sense as changes to data on a single server would replicate changes to the local database, practically in a seamless fashion.
SQLite might be a reasonable choice since it supports search and it is possible to deploy a canned database with the app. However, SQLite can be slow in hybrid apps.
So,
pouchDB and RealmDB would be massive overkill and not a good fit.
SQLite would add a fair bit of complexity.
For your specific requirements I'd stay on your path, though I have a care as it appears flexsearch loads its index into memory - if its performance returns some penalty then SQLite, with it's ability to deploy a canned database and providing a search facility may prove a reasonable trade off versus complexity.
Good luck!
1 Quota Headaches
I would say it really just depends on whether you want and NEED to leverage the power of relational queries. Because your data is never changing I would use JSON unless you are trying to perform complex comparisons between your data. In your case it sounds like you are just going to be searching for the particular ObjectId so JSON is your best bet especially because you are saying you won't need to change the data later.
If you organize your JSON so that your ObjectId are in a sorted order you will easily be able to search quickly.

Firebase Realtime Database vs Cloud Firestore

Edit: After posting the question I thought I could also make this post a quick reference for those of you needs a quick peek at some of the differences between these two technologies which might help you decide on one of them eventually. I will be editing this question and adding more info as I learn more.
I have decided to use firebase for the backend of my project. For firestore is says "the next generation of the realtime database". Now I am trying to decide which way to go. Realtime database or cloud firestore?
Billing:
At a first glance, it looks like firestore charges per number of results returned, number of reads, number of writes/updates etc. Real-time database charges based on the data transmitted. The number of read-write operations is irrelevant. They both also charge on the data stored on the google servers too (I think in this respect firestore is cheaper one). Why am I mentioning this price point? Because from my point of view, although it might a lower weight, it is also a point to consider while choosing the one over the other.
Scaling:
Cloudstore seems to scale horizontally seamlessly. I think this is not possible with the real-time database.
Edit:
In the real-time database, you need to shard your data yourself using multiple databases. And you can only do this if you are in BLAZE pracing plan.
ref: https://firebase.google.com/docs/database/usage/sharding
Performance & Indexing:
Another thing is the real-time database data structure is different in both. The real-time database stores the data as a JSON object in any way we structure them. Firestore structures the data as collections and documents. And hence the querying also changes between the two.
I think firestore does auto indexing which increases the read performance greatly too (which will decrease read performance). I am not sure if this is also the case with the real-time database.
Edit:
The real-time database does not automatically index your data. You need to do it yourself after a solid inspection of your data and your needs.
ref:https://firebase.google.com/docs/database/security/indexing-data
What other differences can you think of?
What would be (or has been) your choice for different types of projects?
Do you still go with the real-time database or have you migrated from that to the firestore? If so why?
And one last thing. How would you compare the SDKs of these two?
Thanks a lot!
What other differences can you think of?
what i think, ok. I use realtime-database for 6 months experience and difference is, firestore easy for sorting data. As Example, i want to retrieving user name based timestamp.
Query firstQuery = firestore.collection("Names").orderBy("timestamp", Query.Direction.DESCENDING).limit(10); // load 10 names
What would be (or has been) your choice for different types of
projects?
For me, Realtime-Database for Data Streaming when i work with Arduino, i want to store Drone Speed.
And Firestore for SMART OFFICE, like Air Conditioner, or light-room and Enterprise like Inventory Quantities, etc.
Do you still go with the real-time database or have you migrated from
that to the firestore? If so why?
still go with real-time because i need TREE for displaying streaming data strucure instead of query TABLE like firestore.

Firestore database model for Notion-like modules [duplicate]

I have seen videos and read the documentation of Cloud firestore, from Google Firebase service, but I can't figure this out coming from realtime database.
I have this web app in mind in which I want to store my providers from different category of products. I want perform a search query through all my products to find what providers I have for such product, and eventually access that provider info.
I am planning to use this structure for this purpose:
Providers ( Collection )
Provider 1 ( Document )
Name
City
Categories
Provider 2
Name
City
Products ( Collection )
Product 1 ( Document )
Name
Description
Category
Provider ID
Product 2
Name
Description
Category
Provider ID
So my question is, is this approach the right way to access the provider info once I get the product I want?
I know this is possible in the realtime database, using the provider ID I could search for that provider in the providers section, but with Firestore I am not sure if its possible or if this is right approach.
What is the correct way to structure this kind of data in Firestore?
You need to know that there is no "perfect", "the best" or "the correct" solution for structuring a Cloud Firestore database. The best and correct solution is the solution that fits your needs and makes your job easier. Bear also in mind that there is also no single "correct data structure" in the world of NoSQL databases. All data is modeled to allow the use-cases that your app requires. This means that what works for one app, may be insufficient for another app. So there is not a correct solution for everyone. An effective structure for a NoSQL type database is entirely dependent on how you intend to query it.
The way you are structuring your data looks good to me. In general, there are two ways in which you can achieve the same thing. The first one would be to keep a reference of the provider in the product object (as you already do) or to copy the entire provider object within the product document. This last technique is called denormalization and is a quite common practice when it comes to Firebase. So we often duplicate data in NoSQL databases, to suit queries that may not be possible otherwise. For a better understanding, I recommend you see this video, Denormalization is normal with the Firebase Database. It's for Firebase Realtime Database but the same principles apply to Cloud Firestore.
Also, when you are duplicating data, there is one thing that needs to keep in mind. In the same way, you are adding data, you need to maintain it. In other words, if you want to update/delete a provider object, you need to do it in every place that it exists.
You might wonder now, which technique is best. In a very general sense, the best way in which you can store references or duplicate data in a NoSQL database is completely dependent on your project's requirements.
So you should ask yourself some questions about the data you want to duplicate or simply keep it as references:
Is the static or will it change over time?
If it does, do you need to update every duplicated instance of the data so they all stay in sync? This is what I have also mentioned earlier.
When it comes to Firestore, are you optimizing for performance or cost?
If your duplicated data needs to change and stay in sync in the same time, then you might have a hard time in the future keeping all those duplicates up to date. This will also might imply you spend a lot of money keeping all those documents fresh, as it will require a read and write for each document for each change. In this case, holding only references will be the winning variant.
In this kind of approach, you write very little duplicated data (pretty much just the Provider ID). So that means that your code for writing this data is going to be quite simple and quite fast. But when reading the data, you will need to load the data from both collections, which means an extra database call. This typically isn't a big performance issue for reasonable numbers of documents, but definitely does require more code and more API calls.
If you need your queries to be very fast, you may want to prefer to duplicate more data so that the client only has to read one document per item queried, rather than multiple documents. But you may also be able to depend on local client caches makes this cheaper, depending on the data the client has to read.
In this approach, you duplicate all data for a provider for each product document. This means that the code to write this data is more complex, and you're definitely storing more data, one more provider object for each product document. And you'll need to figure out if and how to keep up to date on each document. But on the other hand, reading a product document now gives you all information about the provider document in one read.
This is a common consideration in NoSQL databases: you'll often have to consider write performance and disk storage vs. reading performance and scalability.
For your choice of whether or not to duplicate some data, it is highly dependent on your data and its characteristics. You will have to think that through on a case-by-case basis.
So in the end, remember that both are valid approaches, and neither of them is pertinently better than the other. It all depends on what your use-cases are and how comfortable you are with this new technique of duplicating data. Data duplication is the key to faster reads, not just in Cloud Firestore or Firebase Realtime Database but in general. Any time you add the same data to a different location, you're duplicating data in favor of faster read performance. Unfortunately in return, you have a more complex update and higher storage/memory usage. But you need to note that extra calls in Firebase real-time database, are not expensive, in Firestore are. How much duplication data versus extra database calls is optimal for you, depends on your needs and your willingness to let go of the "Single Point of Definition mindset", which can be called very subjective.
After finishing a few Firebase projects, I find that my reading code gets drastically simpler if I duplicate data. But of course, the writing code gets more complex at the same time. It's a trade-off between these two and your needs that determines the optimal solution for your app. Furthermore, to be even more precise you can also measure what is happening in your app using the existing tools and decide accordingly. I know that is not a concrete recommendation but that's software development. Everything is about measuring things.
Remember also, that some database structures are easier to be protected with some security rules. So try to find a schema that can be easily secured using Cloud Firestore Security Rules.
Please also take a look at my answer from this post where I have explained more about collections, maps and arrays in Firestore.

DynamoDb - How to do a batch update?

Coming from a relational background, I'm used to being able to write something like:
UPDATE Table Set X = 1 Where Y = 2
However such an operation seems very difficult to accomplish in a db like Dynamodb. Let's say I have already done a query for the items where Y = 2.
The way I see it, with the API provided there are two options:
Do lots and lots of individual update requests, OR
Do a batch write and write ALL of the data back in, with the update applied.
Both of these methods seem terrible, performance-wise.
Am I missing something obvious here? Or are non relational databases not designed to handle 'updates' on this scale - and if so, can I achieve something similar without drastic performance costs?
No, you are not missing anything obvious. Unfortunately those are the only options you have with DynamoDB, with one caveat being that a BatchWrite is only capable of batch submitting 25 update operations at a time so you'll still have to potentially issue multiple BatchWrite requests.
A BatchWrite is more of a convenience that the DynamoDB API offers to help you save on network traffic, reducing the overhead of 25 requests to the overhead of one, but otherwise it's not much savings.
The BatchWrite API is also a bit less flexible than the individual Update and Put item APIs so in some situations it's better to handle the concurrency yourself and just use the underlying operations.
Of course, the best scenario is if you can architect your solution such that you don't need to perform massive updates to a DynamoDB table. Writes are expensive and if you find yourself frequently having to update a large portion of a table, chances are that there is an alternative design that could be employed.

Firebase Fan out - the most cost effective way?

I know this issue may have been raised multiple times but I have read on most of the questions available but did not found any that can exactly help to answer my question. As proposed by the Firebase team the fan out technique is the recommended way to ensure fast data read, but with the cost of data duplication. I know this question is subjective and depends on the application, but which is the best solution in terms of cost saving($) and data read?
Post same node in multiple child (save data read only called once,
but have redundant, so consume more Firebase storage) (see image Firebase Database - the "Fan Out" technique)
Post only one node, and other reference to the node by its key (not redundant and consume less Firbase storage, but need to read twice - get the key, and get the node for the key) (see image https://stackoverflow.com/a/38215398/1423345)
For context, I am building a non profit marketplace app, so I need to apply the best solution in terms of balancing both between cost saving ($) and fast data read.
On the other hand, read twice (bandwidth) vs bigger storage? Which one is more cost effective?
I would start by saying that ideally in Firebase you read or sync only what's necessary. So your database queries are coupled by other filters to make the query as specific as possible. If you can nail that then you will anyway build a very intelligent data structure which will be cost effective.
Now the real debate Fan - Out technique or just post reference to the nodes. As I personally prefer Fan-Out and also use it successfully so I will answer in reference to that technique only which will also give you indications of the reason that make me not wanna use keeping a reference and all.
First and foremost thing is end-user experience and performance. Which comes in the form of the Big Data Chunk Synchronization. Well in general it means that instead of downloading small chunks you aim for the biggest possible so that you reduce High Cell radio usage, High Battery Drain, High bandwidth and also keep the app updated and in sync as fast as possible.
If you aim for that kind of app performance then you clearly see that Fan-Out is the clear winner over other technique due to following reasons.
You download A Big Data Chunk stored in other node which doesn't let your cell radio stay on for long.
As you download whole info at once, your app performs better than others. Obviously by whole I don't mean that you should download full database. It's all about that smart balance which makes you download just what is required in first go.
It's not that this is the only technique which will give you faster reads and better data structure. There are other techniques like indexing, data validation and security rules which are equally important. All coupled up properly with correct data structure will give you far better performance.
In a situation where you have just a reference to other node and not actual data, then you might end up in a situation where you don't actually have anything to show to your users. Let's say your users aren't getting good connectivity so after one read which gave you just the reference, the network falls. So till the network is up again your users don't see anything and trust me that is a very bad situation for the app. Your aim as a developer should be to reduce the chances of those situations
So, I would recommend you to go for FAN - OUT technique as it is faster and cost effective when you see other factors like data filtering, indexing and security rules as well. Yes it comes with a slight price of high storage usage. But what does a less storage mean when you don't have happy users ? Still it all comes down to personal preference. But I have shared my experience and thoughts hope it helps you make right decision.
I would encourage you to got through this and have a more deeper understanding of no SQL Data modelling
Do let me know if this info helped you.

Resources