Lowering Cloud Firestore API Latency - firebase

I developed an Android application where I use Firebase as my main service for storing data, authenticating users, storage, and more.
I recently went deeper into the service and wanted to see the API usage in my Google Cloud Platform.
In order to do so, I navigated to https://console.cloud.google.com/ to see what it has to show inside APIs and Services:
And by checking what might cause it I got:
Can someone please explain what is the meaning of "Latency" and what could be the reason that specifically this service has so much higher Latency value compared to the other API's?
Does this value have any impact on my application such as slowing the response or something else? If yes, are there any guidelines to lower this value?
Thank you

Latency is the "delay" until an operation starts. Cloud Functions, in particular, have to actually load and start a container (if they have paused), or at least load from memory (it depends on how often the function is called).
Can this affect your client? Holy heck, yes. but what you can do about it is a significant study in and of itself. For Cloud Functions, the biggest latency comes from starting the "container" (assuming cold-start, which your low Request count suggests) - it will have to load and initialize modules before calling your code. Same issue applies here as for browser code: tight code, minimal module loads, etc.
Some latency is to be expected from Cloud Functions (I'm pretty sure a couple hundred ms is typical). Design your client UX accordingly. Cloud Functions real power isn't instantaneous response; rather it's the compute power available IN PARALLEL with browser operations, and the ability to spin up multiple instances to respond to multiple browser sessions. Use it accordingly.

Listen and Write are long lived streams. In this case a 8 minute latency should be interpreted as a connection that was open for 8 minutes. Individual queries or write operations on those streams will be faster (milliseconds).

Related

Firebase Functions, Cold Starts and Slow Response

I am developing a new React web site using Firebase hosting and firebase functions.
I am using a MySQL database (SQL required for heavy data reporting) in GCP Cloud Sql and GCP Secret Manager to house the database username/password.
The Firebase functions are used to pull data from the database and send the results back to the React app.
In my local emulator everything works correctly and its responsive.
When its deployed to Firebase Im noticing the 1st and sometimes the 2nd request to a function takes about 6 seconds to respond. After that they respond less than 1 sec. For the slow responses I can see in the logs the database pool is initialized.
So the slow responses are the first hit to the instance. Im assuming in my case two instances are being created.
Note that the functions that do not require a database respond quickly regardless of it being the 1st or 2nd call.
After about 15 minutes of not using a service I have the same problem. Im assuming the instances are being reclaimed and a new instance is being created.
The problem is each function will have its own independent db pool so each function will initially provide a slow response (maybe twice for the second call).
The site will see low traffic meaning most users will experience this slow response.
By removing the reference to Secret Manager and hard coding username/password the response has dropped to less than 3 seconds. But this is still not acceptable.
Is there a way to:
Increase the time that a function is reclaimed if not used?
Tag an instance that it should not be reclaimed?
Is there a way to create a global db pool that does not get shutdown between recycles?
Is there an approach to work with db connections in Firebase Functions to avoid reinit of the db pool?
Is this the nature of functions and Im limited to this behavior?
Since I am in early development, would moving to AppEngine/Node.js (the Flexible Plan) resolve recycling issues?
First of all, the issues you have been experiencing with the 1st and the 2nd requests taking the longest time are called cold starts.
This totally makes sense because new instances are spun up. You may have a cold start when:
Your function has been deployed but not yet triggered.
Your function has been idle(not processing requests) enough that it has been recycled for resources.
Your function is auto-scaling to handle capacity and creating new instances.
I understand that your five questions are intended to work around the issue of Cloud Functions recycling the instances.
The straight answer from questions 1 to 4 is No because Cloud Functions implement the serverless paradigm.
This means that one function invocation should not rely on in-memory state(database pool) set by a previous invocation.
Now this does not mean that you cannot improve the cold start boot time.
Generally the number one contributor of cold start boot time is the number of dependencies.
This video from the Google Cloud Tech channel exactly goes over the issue you have been experiencing and describes in more detail the practices implemented to tune up Cloud Functions.
If after going through the best practices from the video, your coldstart shows unacceptable values, then, as you have already suggested, you would need to use a product that allows you to have a minimum set of instances spun up like App Engine Standard.
You can further improve the readiness of the App Engine Standard instances by later on implementing warm up requests.
Warmup requests load your app's code into a new instance before any live requests reach that instance. The last document talks about loading requests which is similar to cold starts in which it is the time when your app's code is being loaded to a newly created instance.
I hope you find this useful.

How to minimise Firebase Function Latency

As per the documentation, Firebase Functions are currently supported for 4 regions only - “us-central1”, “us-east1", “europe-west1”, “asia-northeast1"
That means locations further away would incur more latency, and often that translates to lower performance.
How can this limitation be worked around?
1) Choosing a location that is closest to you. You can set up test cloud functions in different regions, and test the round-trip latency. Only you can discover the specifics about your location.
2) Focus your software architecture on infrastructure that is locally available.
Use the client-side Firestore library directly as much as possible. It supports offline data, queueing data to send out later if you don't have internet, and caching read data locally - you can't get faster latency than that! So make sure you use Firestore for CRUD operations.
3) Architect to use CloudFunctions for batch and background processesing. If any business-logic processing is required, write the data to Firestore (using client libraries), and have a FF trigger to do some processing upon the write data-event. Have that trigger update that record with the additional processing, and state. I believe that if you're using the client-side libraries there is a way to have the updated data automatically pushed back to the client-side. (edited)
You also have the bonus benefit of being able to control authorisation with Firestore Auth, where Functions don't have an admin-level authorisation control.
4) Reduce chatter - minimising the amount of CloudFunction calls overall, and ensuring your CloudFunctions themselves do more in one go and return more complete data in one go.

Need Firebase Database behaviour clarification when inside a Service

I am testing a feature which requires a Firebase database write to happen at midnight everyday. Now it is possible that at this particular time, the client app might not be connected to the internet.
I have been using Firebase with persistence off as that can potentially cause issues of stale data in another feature of mine.
From my observation, if I disconnect the app before the write and keep it this way for a minute or so, Firebase eventually reconnects when I turn on the connectivity again and performs the write.
My main questions are:
Will this behaviour be consistent even if the connectivity is lost for quite a few hours?
Will Firebase timeout?
Since it is inside a forever running service, does it still need persistence to ensure that writes are not lost? (assume that the service does not restart).
If the service does restart, will the writes get lost?
I have some experience with this exact case, and I actually do NOT recommend the use of a background service for managing your Firebase requests. In fact, I wouldn't recommend managing Firebase requests at all (explained later).
Services, even though we can make them run forever, tend to get killed by the system quite a lot actually (unless you set their CPU priority to a higher level, but even then the system still might kill them).
If you call a Firebase Write call (of any kind), and your service gets killed, the write will get lost as you said. Unless, you create a sophisticated manager in which you store requests that haven't been committed into your internal storage, and load them up each time the service is restarted - but that is a very dirty work to do, considering the fact that Firebase Developers took care of us and made .setPersistenceEnabled(true) :)
I know, you mentioned you don't want to use it, but I STRONGLY advise you to do so. It works like charm, no services required, and you don't have to worry at all about managing your write requests. Perhaps it would be better to solve the other issue you have in order to make this possible.
To sum up, here's what I would do in your case:
I would call the .setPersistenceEnabled(true) someplace at the beginning (extending the Application class and calling it from onCreate() is recommended)
I would use Android's AlarmManager and register a BroadcastReceiver to receive an alarm at midnight (repetitive or not - you decide)
Inside the BroadcastReceiver, I'd simply call a write function of Firebase and worry about nothing :)
To make sure I covered all of your questions:
will this behaviour be consistent....
No. Case-scenario: Midnight time, your service has successfully received the call and is now trying to write into Firebase. If, for example, the user has no connection until 6 AM (just a case scenario), there is a very high chance that the system will kill it during those 6 hours, and your write will get lost. Flight Time, or staying in an area with no internet coverage - both are examples of risky scenarios that could break your app's consistency
Will Firebase Timeout?
It definitely could, as mentioned. I wouldn't take the risk and make a 80-90% working app. Use persistence and have a 100% working app :)
I believe I covered the rest of the questions..
Good luck!

Why does the first Firebase call from the server take much longer to return than subsequent calls?

Problem
First call to Firebase from a server takes ~15 - 20X longer than subsequent calls. While this is not a problem for a conventional server calling upon Firebase, it may cause issues with a server-less architecture leveraging Amazon Lambda/ Google Cloud Functions.
Questions
Why is the first call so much slower? Is it due to authentication?
Are there any workarounds?
Is it practical to do some user-initiated computation of data on Firebase DB using Amazon Lambda/ Google Cloud Functions and return the results to the client within 1 - 2 seconds?
Context
I am planning on using a server-less architecture with Firebase as the repository of my data and Amazon Lambda/ Cloud Functions augmenting Firebase with some server-side computation, e.g. searching for other users. I intend to trigger the functions via HTTP requests from my client.
One concern that I had was the large time taken by the first call to Firebase from the server. While testing some server-side code on my laptop, the first listener returns back in 6s! Subsequent calls return in 300 - 400ms. The dataset is very small (2 - 3 key value pairs) and I also tested by swapping the observers.
In comparison, a call to the Google Maps API from my laptop takes about 400ms to return.
I realise that response times would be considerably faster from a server. Still a factor of 15 - 20X on the first call is disconcerting.
TL;DR: You're noticing something that's known/expected, though we will shave the penalty down as GA approaches. Some improvements will come sooner than later.
Cloud Functions for Firebase team member here. We are able to offer Cloud Functions at a competitive price by "scaling to zero" (shutting down all instances) after sustained lack of load. When a request comes in and you have no available instances, Cloud Functions creates one for you on demand. This is obviously slower than hitting an active server and is something we call a "cold start". Cold starts are part of the reality of "serverless" architecture, but we have many people working on ways to reduce the penalty dramatically.
There's another case that I've recently started calling a "lukewarm" start. After a deploy, the Cloud Function instance has been created, but your application still has warmup work to do like establishing a connection to the Firebase Realtime Database. Part of this is authentication, as you've suggested. We have detected a slowdown here that will be fixed next week. After that, you'll still have to pay for SSL + Firebase handshakes. Try measuring this latency; it's not clear how much we'll be able to circumvent it.
Thanks Frank!! Read up on how firebase establishes web-socket connections.
To add to Frank's answer, the initial handshake causes the delay in the first pull. The approach drastically speeds up subsequent data pulls. While testing on an Amazon Lambda instance running on a US-west coast server. The response times were: 1) First pull: 1.6 - 2.3s 2) Subsequent pulls: 60 - 100ms. The dataset itself was extremely small, so one can assume that these time periods are simply for server-to-server communications. Takeaways:
Amazon Lambda instances can be triggered via an API gateway for non time-critical computations but is not the ideal solution for real-time computations on Firebase data such as returning search results (unless there is a way to guarantee persisting the handshake over instances - not from what I've read)
For time critical computations, I am going for running EC2/ GAE instances leveraging Firebase Queue. https://github.com/firebase/firebase-queue. The approach is more involved than firing lambda instances, but would return results faster (because of avoiding the handshake for every task).

Meteor server-side memory usage for thousands of concurrent users

Based on this answer, it looks like the meteor server keeps an in-memory copy of the cache for each connected client. My understanding is that it gets used in order to avoid sending multiple copies of data when dealing with overlapping subscriptions on a client.
The relevant part of the linked answer (emphasis is mine):
The merge box: The job of the merge box is to combine the results (added, changed and removed calls) of all of a client's active publish functions into a single data stream. There is one merge box for each connected client. It holds a complete copy of the client's minimongo cache.
Assuming that answer is still accurate in the current version of meteor, couldn't that create a huge waste of memory on the server as the number of users increases?
As an off-the-cuff calculation, if an app had about a 100kB cache per client, then 10,000 concurrent users would use up 1GB of memory on the server, and 100,000 users a whopping 10GB! This would be true even if each client was looking at almost identical data. It seems plausible for an app use much more data than that per client, which would further exacerbate the problem.
Does this problem exist in the current version of Meteor? If so, what techniques can be used to limit the amount of memory the server needs to use to manage all the client subscriptions?
Take a look at this post by Arunoda at his meteorhacks.com blog:
http://meteorhacks.com/making-meteor-500-faster-with-smart-collections.html
which talks about his Smart Collections page:
http://meteorhacks.com/introducing-smart-collections.html
He created an alternative Collection stack which has succeeded in it's goals for speed, efficiency (memory & cpu) and scalability (you can see a graphed comparison in the post). Admittedly in his tests RAM usage was negligent with both Collection types, although the way he's implemented things there should be a very obvious difference with the type of use case you mentioned.
Also, you can see in this post on meteor-core:
https://groups.google.com/d/msg/meteor-core/jG1KLObX1bM/39aP4kxqWZUJ
that the Meteor developers are aware of his work and are cooperating in implementing some of the improvements into Meteor itself (but until then his smart package works great).
Important note! Smart collections relies on access to the Mongo Oplog. This is easy if you're running on your own machine or hosted infrastructure. If you're using a cloud based database, this option might not be available, or if it is, will cost a lot more than the smaller packages.

Resources