Best strategy to develop back end of an app with large userbase, taking into account limitations of bandwidth, concurrent connections etc.? - wordpress

I am developing an Android app which basically does this: On the landing(home) page it shows a couple of words. These words need to be updated on daily basis. Secondly, there is an 'experiences' tab in which a list of user experiences (around 500) shows up with their profile pic, description,etc.
This basic app is expected to get around 1 million users daily who will open the app daily at least once to see those couple of words. Many may occasionally open up the experiences section.
Thirdly, the app needs to have a push notification feature.
I am planning to purchase a managed wordpress hosting, set up a website, and add a post each day with those couple of words, use the JSON-API to extract those words and display them on app's home page. Similarly for the experiences, I will add each as a wordpress post and extract them from the Wordpress database. The reason I am choosing wordpress is that it has ready made interfaces for data entry which will save my time and effort.
But I am stuck on this: will the wordpress DB be able to handle such large amount of queries ? With such a large userbase and spiky traffic, I suspect I might cross the max. concurrent connections limit.
What's the best strategy in my case ? Should I use WP, or use firebase or any other service ? I need to make sure the scheme is cost effective also.
My app is basically very similar to this one:
https://play.google.com/store/apps/details?id=com.ekaum.ekaum
For push notifications, I am planning to use third party services.
Kindly suggest the best strategy I should go with for designing the back end of this app.
Thanks to everyone out there in advance who are willing to help me in this.

I have never used Wordpress, so I don't know if or how it could handle that load.
You can still use WP for data entry, and write a scheduled function that would use WP's JSON API to copy that data into Firebase.
RTDB-vs-Firestore scalability states that RTDB can handle 200 thousand concurrent connections and Firestore 1 million concurrent connections.
However, if I get it right, your app doesn't need connections to be active (i.e. receive real-time updates). You can get your data once, then close the connection.
For RTDB, Enabling Offline Capabilities on Android states that
On Android, Firebase automatically manages connection state to reduce bandwidth and battery usage. When a client has no active listeners, no pending write or onDisconnect operations, and is not explicitly disconnected by the goOffline method, Firebase closes the connection after 60 seconds of inactivity.
So the connection should close by itself after 1 minute, if you remove your listeners, or you can force close it earlier using goOffline.
For Firestore, I don't know if it happens automatically, but you can do it manually.
In Firebase Pricing you can see that 100K Firestore document reads is $0.06. 1M reads (for the two words) should cost $0.6 plus some network traffic. In RTDB, the cost has to do with data bulk, so it requires some calculations, but it shouldn't be much. I am not familiar with the pricing small details, so you should do some more research.
In the app you mentioned, the experiences don't seem to change very often. You might want to try to build your own caching manually, and add the required versioning info in the daily data.
Edit:
It would possibly be more efficient and less costly if you used Firebase Hosting, instead of RTDB/Firestore directly. See Serve dynamic content and host microservices with Cloud Functions and Manage cache behavior.
In short, you create a HTTP function that reads your database and returns the data you need. You configure hosting to call that function, and configure the cache such that subsequent requests are served the cached result via hosting (without extra function invocations).

Related

Is there a Google API answering about Firestore database either Metrics or Health Checks or Current Active Connectios or Exceptions or Performance

Context: I am total Google Cloud begginer and I have just convinced my company headers to use Firestore Realtime Database for pushing transaction status to our mobile application. We have around 4 millions users that will use significantly our application for small money transfers. Now-a-days we use the concept of polling from Android/IOS to our Microservice endpoints and it will replaced by Firebase SDK imported to our Mobile app which will listen/observe to our Firestore Collection following few Firestore Rules. Since all money transfer will be confirmed/denied in short time (from few seconds to 1 or 2 minutes) the idea of replacing polling by a real reactive approach straigh from Firestore sounded and is already ongoing coding.
The issue: Firstly I don't what to compare solutions. It is just my reality: the prodution support operators must look after our internal Dashboard. Isn't allowed to them look at Google Dashboard Console (please accept this for this question). I need get on demand metrics of our FIrestore. It is nothing to do with Google pricing. It is just our demand: they want to see metrics like:
how many users listening at the same time now
how many users took some exception during connection
is there any user holding connection for more than X minute
when was the connection pick this morning
any exception of any type surrounding our Firestore database
I read Code Samples carefully follow the sample step-by-step trying to figure out some idea if there is some API providing the answers I am looking for.
So, my straight question is: is there such type of Google API providing metrics about my Firestore Database? Maybe following the same idea we found in Performance Monitor which works on Mobile side also some similar aproach on Firestore side.
*** Edited
Future readers may find worth read also about a way to get Firestore metrics info striagh from curl/postman
A couple of things: You mentioned both Firestore and Realtime Database; just wanted to make sure that you are aware that those are two different databases offered under the Firebase umbrella.
how many users listening at the same time now
is there any user holding connection for more than X minute
Yes, there's a dashboard: https://support.google.com/firebase/answer/6317517?hl=en. Including lots of options, like users active in the last 30 mins.
how many users took some exception during connection
any exception of any type surrounding our Firestore database
Yes, you can track errors and other logging via Stack Driver logging. These can give you reports on your cloud functions.
https://cloud.google.com/functions/docs/monitoring
Where can I find Stackdriver in Firebase console?
when was the connection pick this morning
For this one, I'm not sure if you mean A. when did somebody log on in the morning, or B. what was the time that there was the peak \ most usage. If B see 1. If A,
Real-time database has the concept of presence, which lets you know if a user is currently logged in or not. See examples here from the official documentation:
https://firebase.google.com/docs/firestore/solutions/presence
and this post
How to make user presence mechanism using Firebase?
Also applies to your
is there any user holding connection for more than X minute
..............
Edit in response to comments: I believe you are experiencing the XY problem https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem where you are focused on a particular solution, even though your problem has other solutions. User metrics, database events, and errors are all accessible through both dashboards and cloud functions. You can cURL cloud functions if you wish, or set up cron functions to auto report, or set up database trigger functions to log errors. So, while the exact way you want this to work may not exist, you just need to connect existing tools to get the result you want.

How fast can i reach 1GB in firebase realtime database

I am using firebase database and my question is, for example how fast can I reach 1GB if i have 100 users each storing worth 10 document pages of microsoft word full of text everyday, for one month?
Word documents would be stored in Firebase Storage, not the realtime database. Realistically, the only way you will be billed anything for using the Firebase platform is if your app gets a significant of usage. I suspect that 99% of firebase apps do not generate any billing whatsoever. ...that's just a hunch.
If you do run into billing issues, that will/would be a good thing.
Although this question is too broad since it lacks various variables like the number of users, size of the files and how this data is presented in the app I will try to give my $0.02 on this in a very generic way which can also be interpreted as how not to end up with a huge bill while using firebase,
Even though Firebase provides a sufficient space to test out the app in production there is a lot of ways in which things can go bad real quick like,
1) since firebase automatically handles the sync this additional read/write call comes out of your quota apart from the call you trigger check-out how one app developers found this out the hard way
2) if you have bad DB schema/design that you have not addressed, then you end up making multiple calls to the server to fetch the data which again bloats up the number of calls you make read about this here
3) Not setting spending limits and alerts, this should be a mandatory step to avoid a lot of the above problems even though the docs clearly gives an indication on how to set this up
These are some of the cases that I have come across I hope this serves as a guideline to set up your app

Firebase unexpected number of concurrent connections

I just hit a situation which pushed me to ask this question:
I have about 150 active monthly users and I just hit 1k concurrent connections on a single day.
I did research and found many questions on "firebase concurrent connections" topic and those who refers to user-to-concurrent ratio say that on average it's close to 1 concurrent = ~1400 monthly users (like here and here).
I'm now trying to understand if I really did something wrong and if yes, how to fix that?
The questions are:
Is it look ok to get 1k concurrent connections with about 150 active users? Or am I reading it wrong?
Is it possible to profile concurrent connections somehow?
What are the typical "connection leaks" when it comes to chrome extensions and how to avoid them?
So far the architecture of the extension is that all the communication with firebase database is made from the background persistent script which is global to a browser instance.
And as a note, 150 active users is an estimation. For upper boundary I can say that I have 472 user records in total and half of them installed the extension and uninstalled it shortly after that - so they are not using it. And about 20% of the installed instances are also disabled in chrome.
Here is what I get after discussing with the support team:
here are other common use cases that can add up to the number of
connections in your app:
Opening your web app in multiple tabs on your browser (1 connection per tab)
Accessing the Realtime Database dashboard from the Firebase Console (1 connection per tab)
Having Realtime Database triggers
So Realtime Database triggers appeared to be my case.
Further discussion revealed the following:
In the case of uploading 200 data points which each trigger a
function, any number of concurrent connections between 1 and 200 is
possible. It all depends on how Cloud Functions scales. In an extreme
case it could just be one instance that processes all 200 events one
at a time. In another extreme case the Cloud Functions system could
decide to spin up 200 new server instances to handle the incoming
events. There's no guarantee as to what will happen here, Cloud
Functions will try to do the right thing. Each case would cost the
user the same amount on their Cloud Functions bill. What's most likely
in a real application (where it's not a complete cold start) is
something in the middle.
There's no need to worry about the number of concurrent connections
from Cloud Functions to RTDB. Cloud Functions would never spin up
anywhere near 100k server instances. So there's no way Cloud Functions
would eat up the whole concurrency limit. That would only happen if
there are many users on the client app accessing your database
directly from their devices.
So the described behavior in my question seems to be expected and it will not come any close to the limit of 100k connections from server side.

Firebase connection count with angular bindings?

I've read quite a few posts (including the firebase.com website) on Firebase connections. The website says that one connection is equivalent to approximately 1400 visiting users per month. And this makes sense to me given a scenario where the client makes a quick connection to the Firebase server, pulls down some data, and then closes the connection. However, if I'm using angular bindings (via angularfire), wouldn't each client visit (in the event the user stays on the site for a period of time) be a connection? In this example having 100 users (each of which is making use of firebase angular bindings) connecting to the site at the same time would be 100 connections. If I opted not to use angular bindings, that number could be (in a theoretical sense) 0 if all the clients already made their requests for data and were just idling.
Do I understand this properly?
AngularFire is built on top of Firebase's regular JavaScript/Web SDK. The connection count is fundamentally the same between them: if a 100 users are using your application at the same time and you are synchronizing data for each of them, you will have 100 concurrent connections at that time.
The statement that one concurrent connection is the equivalent of about 1400 visits per month is based on the extensive experience that the Firebase people have with how long the average connection lasts. As Andrew Lee stated in this answer: most developers vastly over-estimate the number of concurrent connections they will have.
As said: AngularFire fundamentally behaves the same as Firebase's JavaScript API (because it is built on top of that). Both libraries keep an open connection for a user, so that they can synchronize any changes that occur between the connected users. You can manually drop such a connection by calling goOffLine and then re-instate it with goOnline. Whether that is a good approach is largely dependent on the type of application you're building.
Two examples:
There recently was someone who was building a word game. He used Firebase to store the final score for each game. In his case explicitly managing the connections makes sense, because the connection is only needed for a relatively short time when compared to the time the application is active.
The "hello world" for Firebase programming is a chat application. In such an application it doesn't make a lot of sense to manage the connections yourself. So briefly connect every 15 seconds and then disconnect again. If you do this, you're essentially reverting to polling for updates. Doing so will lose you one of the bigger benefits of using Firebase: it automatically synchronizes data to connected clients.
So only you can decide whether explicit connection management is best for you application. I'd recommend starting without it (it's simpler) and first testing your application on a smaller scale to see how actual usage holds up to your expectation.

Sending notifications according to database value changes

I am working on a vendor portal. An owner of a shop will login and in the navigation bar (similar to facebook) I would like the number of items sold to appear INSTANTLY, WITHOUT ANY REFRESH. In facebook, new notifications pop up immediately. I am using sql azure as my database. Is it possible to note a change in the database and INSTANTLY INFORM the user?
Part 2 of my project will consist of a mobile phone app for the vendor. In this app I, too , would like to have the same notification mechanism. In this case, would I be correct if I search on push notifications and apply them?
At the moment my main aim is to solve the problem in paragraph 1. I am able to retrieve the number of notifications, but how on earth is it possible to show the changes INSTANTLY? thank you very much
First you need to define what INSTANT means to you. For some, it means within a second 90% of the time. For others, they would be happy to have a 10-20 second gap on average. And more importantly, you need to understand the implications of your requirements; in other words, is it worth it to have near zero wait time for your business? The more relaxed your requirements, the cheaper it will be to build and the easier it will be to maintain.
You should know that having near-time notification can be very expensive in terms of computing and locking resources. The more you refresh, the more web roundtrips are needed (even if they are minimal in this case). Having data fresh to the second can also be costly to the database because you are potentially creating a high volume of requests, which in turn could affect otherwise good performing requests. For example, if your website runs with 1000 users logged on, you may need 1000 database requests per second (assuming that's your definition of INSTANT), which could in turn create a throttling condition in SQL Azure if not designed properly.
An approach I used in the past, for a similar requirement (although the precision wasn't to the second; more like to the minute) was to load all records from a table in memory in the local website cache. A background thread was locking and refreshing the in memory data for all records in one shot. This allowed us to reduce the database traffic by a factor of a thousand since the data presented on the screen was coming from the local cache and a single database connection was needed to refresh the cache (per web server). Because we had multiple web servers, and we needed the data to be exactly the same on all web servers within a second of each other, we synchronized the requests of all the web servers to refresh the cache every minute. Putting this together took many hours, but it allowed us to build a system that was highly scalable.
The above technique may not work for your requirements, but my point is that the higher the need for fresh data, the more design/engineering work you will need to make sure your system isn't too impacted by the freshness requirement.
Hope this helps.

Resources