I want to use firestore in my app due to the scaling limit being 1 million concurrent connections. I have found the pricing to be quite high especially when compared with the real time database, but cannot use this as it only scales to around 200k.
I was wondering whether I could use firestore which will be directly accessed on the client side for some of my data that will need live document listeners and use the realtime data for storing larger chunks of data which will be queried indirectly using firebase functions.
My question is:
if the only way to read/write the realtime database is through a cloud function which is called by the client side, will this only count as 1 concurrent connection as the client side is not directly connected to it?
Thank you
but cannot use [Realtime Database] as it only scales to around 200k.
Keep in mind that this is per database instance. On a paid project, you can create additional database instances to scale much further (even beyond the 1m concurrents that Firestore supports), as long as you are able/willing to define how to distribute your users over the database instances (commonly referred to as a "sharding strategy").
On your actual question: each Cloud Functions instance counts as a single connection to the database. Keep in mind here that Cloud Functions auto-scale, so you will have as many connections from Cloud Functions as you have concurrently running Cloud Functions instances. So while it may well be more than a single connection, it is extremely unlikely you'll reach the limit of 200K connections through this means.
Related
i've been searching for what is the concurrent users limit for the cloud firestore spark plan but couldn't find it.
https://firebase.google.com/docs/firestore/quotas
It did said 1.000.000 concurrent users limit, but did not mention whether it is for the spark plan or the blaze plan. I've also tried searching answer elswhere, but did not find it answered specifically (with a source).
Help would be appreciated, thank you.
Per the Cloud Firestore pricing information (which Firebase uses):
When you use Firestore, you are charged for the following:
The number of documents you read, write, and delete.
The amount of storage that your database uses, including overhead for metadata and indexes.
The amount of network bandwidth that you use.
There is also no mention of any connection limits on Firebase's pricing page or the quotas documentation that you linked.
Unlike the Realtime Database, Cloud Firestore does not charge on a per-connection basis.
This video series also covers the ins and outs of Firebase products and is well worth sitting through.
Think of Cloud Firestore like a folder on your computer, which can contain thousands of little text files, similar to how documents in Cloud Firestore are stored. Users can update them with little chance of collision and grabbing a single document file would only require feeding 1s and 0s back to the requestor. This is why you are charged for network bandwidth rather than by individual connection.
In comparison, the RTDB was similar to one large JSON text file, with many people all trying to update it at once. Because parsing this text file on the server side was required to read and write data from it, it required compute resources to be able to do so. For this reason (among others), the number of connections the RTDB manager processes handled on behalf of spark plans were rate-limited to prevent abuse.
Hey so with my current feed database design, I am using Redis for the cache for super-fast reads, which are routed through my Google Cloud Functions. The Redis database handles all post data and timeline updates, which is great and all, but I forgot one of the most considerable caveats to this. Firebase Firestore only permits one document write per second, meaning that if I have a document that stores the post data (post_id, user_id, content, like_count), the like_count would be impossible to track with the possibility for many likes per second. Does anyone have any solutions to this?
You can shard your counter among multiple documents and query them in aggregate as needed.
You can also try Cloud Tasks queue to smooth out the write frequency. It will add considerable complexity to the system, but is really the only genericized way in GCP to manage the rate of some work. This might not work out the way you need, however.
If you use Cloud Tasks, your task will need to be configured with a rate limit, and it will have to deliver the document data to write to yet another function or other HTTP endpoint that will perform the write.
Our team is developing a mobile app and is currently in use of (Firebase) Firestore for our backend. We wrapped every DB access with Firebase Functions in order to clean up the object returned to the client app.
Does this approach introduce any (additional) unignorable overhead compared to accessing to Firestore directly?
Yes but No depending on your use case.
If you have small amount of users with relatively low usage (in terms of the given quota), it is recommended to apply Cloud Functions. As stated in the documentation, Firebase Cloud Function offers big quota in terms of Resource limits, Time limits and Rate limits with good pricing especially for the Spark plan (FREE).
The advantage of using Cloud Functions is that it has a high speed and scalable computing / processing unit which could shorten the processing time of a specific function as compared with using the mobile phone CPU which in some cases the mobile phones has low computing power (have to consider various users as not everyone own a high spec phone), in order to provide better user experience (UX), all this hassle can be done by Cloud Function!
Note: I do agree with Doug where cost is one of the factor, but we should also consider the performance and other perspective.
Yes, at the very least, now your path to get data has two hops instead of one. Before, you directly accessed the database using a channel that's optimized for returning the query results. Now, you have to pay the cost of an additional hop to Cloud Functions, which makes the query. And it's possible that the results returned to the client are bigger than if you made the query directly.
Perhaps the biggest loss you'll experience is the client side caching of documents that's automatically performed by the client (enabled by default on Android and iOS). If you repeat a query and none of the documents have changed, you get immediate results from the cache instead of having to wait for the server. And you won't have to pay for document reads for cache hits. So, if you aren't also caching your results, you're also paying the monetary cost of Cloud Functions and the query to Firestore for every request.
Yes, but the answer could be different based on the situation.
If a client wants to fetch a record exactly as in the database, the Firebase SDK might be faster because there is no overhead calling the Firebase Functions.
If we have a heavy processing after fetching a record, then Firebase Functions + Firebase Admin SDK could be faster because the processing unit in Firebase Functions could be faster than mobile CPU. However, if the request responds faster, the client app could display an additional message that something was fetched and currently in process during the heavy processing, the user experience could be acceptable.
The only case I can come up with Firebase Functions could always win is that the server reduces the data size so that the overhead introduced by Firebase Functions (including processing time) was compensated by the shorter network delay. This also has advantage of saving client's data plan.
The application I'm currently working on needs real-time communication that is scalable. We have been looking into and tried out Firebase real-time database and firestore. It seems Firebase real-time database is more mature and tested out, while firestore is still in beta, which is why we are leaning towards the real-time database.
We are however worried about its scaling capabilities in our context. Our queries will mainly be geo spatial based on the user's location. According to Firebase simultaneous realtime connections to my database and https://firebase.google.com/pricing/#faq-simultaneous the maximum number of concurrent users is 100.000, which will be too low for our needs.
According to their documentation, it seems like database sharding is the way to scale beyond 100.000 concurrent users https://firebase.google.com/docs/database/usage/sharding. Since our queries are based on the user's location, we could group the data into regions, e.g. US West, US Central, and US East and have a database instance for each of those three regions.
While this method may work, it seems very cumbersome to set it up. We would probably need to have a service which the user initially connects to in order to be redirected to the correct database instance that fits the region which the user is in. Additionally, it should handle the case where a user moves into another region, and should therefore be redirected to another database instance containing the data for that specific region.
Another complex task would be to distribute the data into the correct database instances.
Is there a more simple approach to scale beyond 100.000 users or is it possible to increase the amount of concurrent connections for a single Firebase real-time database?
To me it seems like almost a waste to use Firebase if it requires you to do so much "load" balancing yourself.
The 100K concurrent connections is a hard cap on the Firebase Realtime Database.
The approach you describe with a two-step connect is quite idiomatic. The first step is usually quite simple. In fact for many apps it is part of their authentication flow, or based on the outcome of that. For example, many apps base the user's shard on a hash of their UID.
In your case, you could inject the users region into their token as a custom claim when they register. Then you'd get that claim when they sign in, and can redirect them to their shard. You could also persist the shard info in the client when they first connect, so that you only have to determine that only once for each client/device.
Is there a more simple approach to scale beyond 100.000 users or is it
possible to increase the amount of concurrent connections for a single
Firebase real-time database?
Yes. use Firestore database.
Scales completely automatically. Currently, scaling limits are:
Around 1 million concurrent connections and 10,000 writes/second. (they plan to increase these limits in the future) (source)
Maximum write rate to a document is 1 per second (source)
Is officially out of beta and in General Availability from 31/1/2019 (source)
I am wondering whether it is a sound strategy to use the firebase offline capabilities as a "free" cache.
Let's assume that I am in activity A, I fetch some data from firebase, and then I move to activity B, which needs the same data. If the app is configured with setPersistenceEnabled(true) and, if necessary, also with keepSynced(true), can I just re-query the same data in activity B, rather that passing it around?
I understand that there is a difference between the two approaches regarding reading-from-memory and reading-from-disk (firebase offline cache). But do I really get rid of all the network overhead by using firebase offline?
Relevant links:
Firebase Offline Capabilities and addListenerForSingleValueEvent
https://groups.google.com/forum/#!msg/firebase-talk/ptTtEyBDKls/XbNKD_K8CQAJ
Yes, you can easily re-query your Firebase Database in each activity instead of passing data around. If you enable disk persistence, this will be a local read operation. But since you attach a listener (or keep it attached through keepSynced()), it will cause network traffic.
But don't use Firebase as an offline-only database. It is really designed as an online database that can work for short to intermediate periods of being disconnected. While offline it will keep queue of write operations. As this queue grows, local operations and app startup will slow down. Nothing major, but over time these may add up.