firebase database strategy for simultaneous connections of 100k+? - firebase

If you are going to implement a firebase backend service that goes insanely viral and has a growth rate of over 40% per 10 days and you hit the quota of the blaze plan of nothing over 100,000 simultaneous connections. What kind of working strategies could be implemented with firebase for handling simultaneous connections of over 100k? 1M? 100M? 1B!!?

100k concurrents is quite a lot of concurrent connections, but not out of the question for large apps, as we've had applications with > 1MM.
In general, the strategy for doing this involves sharding data across multiple databases. This is pretty trivial if the data is all independent (e.g. per user todo list), since you can assign a developer to a database and you never have to sync across databases.
Read only data (such as 1:N chat) is generally also pretty straightforward, as you can perform the same "allow someone to connect to any of N copies of the same database", and have a single user/job update all of them with the same information.
For more complicated schemes (particularly 1:1 chat), allowing developers to connect to any database, then syncing data across databases using Cloud Functions or another system is probably recommended.
If you're expecting to realistically hit 100k+ concurrents, feel free to reach out to our support team with more info on the use case and we're happy to work with you.

Related

Performance difference Firestore through Firebase Functions vs Firestore SDK

Our team is developing a mobile app and is currently in use of (Firebase) Firestore for our backend. We wrapped every DB access with Firebase Functions in order to clean up the object returned to the client app.
Does this approach introduce any (additional) unignorable overhead compared to accessing to Firestore directly?
Yes but No depending on your use case.
If you have small amount of users with relatively low usage (in terms of the given quota), it is recommended to apply Cloud Functions. As stated in the documentation, Firebase Cloud Function offers big quota in terms of Resource limits, Time limits and Rate limits with good pricing especially for the Spark plan (FREE).
The advantage of using Cloud Functions is that it has a high speed and scalable computing / processing unit which could shorten the processing time of a specific function as compared with using the mobile phone CPU which in some cases the mobile phones has low computing power (have to consider various users as not everyone own a high spec phone), in order to provide better user experience (UX), all this hassle can be done by Cloud Function!
Note: I do agree with Doug where cost is one of the factor, but we should also consider the performance and other perspective.
Yes, at the very least, now your path to get data has two hops instead of one. Before, you directly accessed the database using a channel that's optimized for returning the query results. Now, you have to pay the cost of an additional hop to Cloud Functions, which makes the query. And it's possible that the results returned to the client are bigger than if you made the query directly.
Perhaps the biggest loss you'll experience is the client side caching of documents that's automatically performed by the client (enabled by default on Android and iOS). If you repeat a query and none of the documents have changed, you get immediate results from the cache instead of having to wait for the server. And you won't have to pay for document reads for cache hits. So, if you aren't also caching your results, you're also paying the monetary cost of Cloud Functions and the query to Firestore for every request.
Yes, but the answer could be different based on the situation.
If a client wants to fetch a record exactly as in the database, the Firebase SDK might be faster because there is no overhead calling the Firebase Functions.
If we have a heavy processing after fetching a record, then Firebase Functions + Firebase Admin SDK could be faster because the processing unit in Firebase Functions could be faster than mobile CPU. However, if the request responds faster, the client app could display an additional message that something was fetched and currently in process during the heavy processing, the user experience could be acceptable.
The only case I can come up with Firebase Functions could always win is that the server reduces the data size so that the overhead introduced by Firebase Functions (including processing time) was compensated by the shorter network delay. This also has advantage of saving client's data plan.

Firebase Scalability - More than 100k user at the same time and across multiple region

I know scalability is not an issue in Firebase and supports up to 100k Simultaneous connections(in general).
Based on pricing documentation:
You can create multiple database instances to go beyond the 100K
concurrent limit. See Pricing FAQ for more information.
Question 1: What if there is more than 200k users using simultaneously on the same database? The other half of the users could not query, connect or the request will be placed in queue?
(As a Firebase plan subscriber, I would like to know how Firebase deals with the problem to ensure the quality of the services provided to our customers are always in top-notch)
Since, App globalisation is common nowadays and many companies' practices are to have servers across multiple regions to provide better and stable performance. Online game for example which required low latency.
As for now, the firebase user is required to set the default location when creating the project which is non-editable afterward. Some issues even rises where the users realised they deployed their app to the wrong regions and do not have clues on how to change the regions.
This represents the country/region of your organisation/company. Your
selection also sets the appropriate currency for your revenue
reporting. The selected country does not determine the location of
your data for Firebase features. Google may process and store Customer
Data anywhere Google or its agents maintain facilities.
Question 2: Will or does Firebase provide a solution / tailor-made to such practice which having our database in multiple regions while having a headquartered region and multiple other regions sharing all the databases, functions and auth across the regions?
(For now to have multiple servers location, we have to create different projects and the user and data syncing will be a problem)
Hope the language does not offend, cheers!
It seems like your question (or at least your assumptions) is based on the Firebase Realtime Database, so I'll answer for that below.
Q1) You can create more than 2 databases in a single project, each of which allows 100K connections. So it can scale beyond 200K connections. All of these are hosted in the same region though, so you can't use each database for a separate region.
Q2) For a database solution that handles multiple regions, I'd recommend looking at Cloud Firestore. Also see: Cloud Firestore - selecting region to store data?

Firebase Realtime Database - Scaling above 100.000 concurrent connections

The application I'm currently working on needs real-time communication that is scalable. We have been looking into and tried out Firebase real-time database and firestore. It seems Firebase real-time database is more mature and tested out, while firestore is still in beta, which is why we are leaning towards the real-time database.
We are however worried about its scaling capabilities in our context. Our queries will mainly be geo spatial based on the user's location. According to Firebase simultaneous realtime connections to my database and https://firebase.google.com/pricing/#faq-simultaneous the maximum number of concurrent users is 100.000, which will be too low for our needs.
According to their documentation, it seems like database sharding is the way to scale beyond 100.000 concurrent users https://firebase.google.com/docs/database/usage/sharding. Since our queries are based on the user's location, we could group the data into regions, e.g. US West, US Central, and US East and have a database instance for each of those three regions.
While this method may work, it seems very cumbersome to set it up. We would probably need to have a service which the user initially connects to in order to be redirected to the correct database instance that fits the region which the user is in. Additionally, it should handle the case where a user moves into another region, and should therefore be redirected to another database instance containing the data for that specific region.
Another complex task would be to distribute the data into the correct database instances.
Is there a more simple approach to scale beyond 100.000 users or is it possible to increase the amount of concurrent connections for a single Firebase real-time database?
To me it seems like almost a waste to use Firebase if it requires you to do so much "load" balancing yourself.
The 100K concurrent connections is a hard cap on the Firebase Realtime Database.
The approach you describe with a two-step connect is quite idiomatic. The first step is usually quite simple. In fact for many apps it is part of their authentication flow, or based on the outcome of that. For example, many apps base the user's shard on a hash of their UID.
In your case, you could inject the users region into their token as a custom claim when they register. Then you'd get that claim when they sign in, and can redirect them to their shard. You could also persist the shard info in the client when they first connect, so that you only have to determine that only once for each client/device.
Is there a more simple approach to scale beyond 100.000 users or is it
possible to increase the amount of concurrent connections for a single
Firebase real-time database?
Yes. use Firestore database.
Scales completely automatically. Currently, scaling limits are:
Around 1 million concurrent connections and 10,000 writes/second. (they plan to increase these limits in the future) (source)
Maximum write rate to a document is 1 per second (source)
Is officially out of beta and in General Availability from 31/1/2019 (source)

ASP.NET MembershipProvider - SQL Server vs. Active Directory

There are several option for storing the users info when dealing with ASP.NET Membership providers. I would like to ask if they are comparable in terms of performance. Especially of ActiveDirectoryMembershipProvider and SqlMembershipProvider if when there will be e.g. 100 000 users recorded.
Both Providers can handle the workload. Question is if the infrastructure below can handle it. An AD-Server with 100.000 accounts should be big enough to handle it.
So, the real question in my eyes is, do you write the app for an intranet and want to provide SSO functionality? Then, by all means, go with ActiveDirectory!
Your question is unanswerable, as "performance" depends greatly upon many factors.. for instance, network speed, network latency, network saturation, the power of your AD server vs your SQL Server, the disk subsystems in use in either, etc...
There is no way to say one way or the other without thoroughly evaluating each environment, and even at that point, you should just benchmark each and determine what works best for you.
In most cases, though.. the decision between sql vs ad has nothing to do with performance, and has to do with the features offered by each. I would strongly doubt you have 100,000 users in your active directory, as that would cost a millions of dollars in licensing costs.

How robust is the asp.net profile system? Is it ready for prime time?

I've used asp.net profiles (using the AspNetSqlProfileProvider) for holding small bits of information about my users. I started to wonder how it would handle a robust profile for a large number of users. Does anyone have experience using this on a large website with large numbers of simultaneous users? What are the performance implications? How about maintenance?
Running against this via SQL I have found is a bit tricky, but i have worked with clients that have scaled it up to a few hundred properties, and 10K+ users without difficulty. Granted not a lot of users but it is working thus far.
I think it really depends on the specific project, and your exact needs when it comes to working with the profile information. Do you need to query on it regularly via SQL? Do you just need to for user display only, these types of things might help provide a more solid answer for your needs.
The SQL provider performance is more closely correlated to big iron throughput. Performance is more or less directly proportional to a single SQL Server's ability to handle the number of queries. Scale-up is the only option, so as such its not really five-nines robust out the box.
You'll have to figure out if you need scale-out performance and availability e.g. through partitioning, replication, redundancy etc. and at what cost to performance. Some of the capabilities are are possible as is - the current implementation is more aimed at the middle-market and enterprise.
Good thing is you can put your own implementation of the profile provider - then attach it to services and systems with capabilities outlined above.
We wrote a custom authn,authz and profile provider and strapped it to large AD/LDS LDAP cluster across 3 datacenters. We're in the Comscore Top 10 - so you could say that we deal with a good slice of internet every day. 1000's of profile queries per second and 100'millions of profiles - it can scale with good planning, engineering and operations.

Resources