Firebase Functions, Cold Starts and Slow Response - firebase

I am developing a new React web site using Firebase hosting and firebase functions.
I am using a MySQL database (SQL required for heavy data reporting) in GCP Cloud Sql and GCP Secret Manager to house the database username/password.
The Firebase functions are used to pull data from the database and send the results back to the React app.
In my local emulator everything works correctly and its responsive.
When its deployed to Firebase Im noticing the 1st and sometimes the 2nd request to a function takes about 6 seconds to respond. After that they respond less than 1 sec. For the slow responses I can see in the logs the database pool is initialized.
So the slow responses are the first hit to the instance. Im assuming in my case two instances are being created.
Note that the functions that do not require a database respond quickly regardless of it being the 1st or 2nd call.
After about 15 minutes of not using a service I have the same problem. Im assuming the instances are being reclaimed and a new instance is being created.
The problem is each function will have its own independent db pool so each function will initially provide a slow response (maybe twice for the second call).
The site will see low traffic meaning most users will experience this slow response.
By removing the reference to Secret Manager and hard coding username/password the response has dropped to less than 3 seconds. But this is still not acceptable.
Is there a way to:
Increase the time that a function is reclaimed if not used?
Tag an instance that it should not be reclaimed?
Is there a way to create a global db pool that does not get shutdown between recycles?
Is there an approach to work with db connections in Firebase Functions to avoid reinit of the db pool?
Is this the nature of functions and Im limited to this behavior?
Since I am in early development, would moving to AppEngine/Node.js (the Flexible Plan) resolve recycling issues?

First of all, the issues you have been experiencing with the 1st and the 2nd requests taking the longest time are called cold starts.
This totally makes sense because new instances are spun up. You may have a cold start when:
Your function has been deployed but not yet triggered.
Your function has been idle(not processing requests) enough that it has been recycled for resources.
Your function is auto-scaling to handle capacity and creating new instances.
I understand that your five questions are intended to work around the issue of Cloud Functions recycling the instances.
The straight answer from questions 1 to 4 is No because Cloud Functions implement the serverless paradigm.
This means that one function invocation should not rely on in-memory state(database pool) set by a previous invocation.
Now this does not mean that you cannot improve the cold start boot time.
Generally the number one contributor of cold start boot time is the number of dependencies.
This video from the Google Cloud Tech channel exactly goes over the issue you have been experiencing and describes in more detail the practices implemented to tune up Cloud Functions.
If after going through the best practices from the video, your coldstart shows unacceptable values, then, as you have already suggested, you would need to use a product that allows you to have a minimum set of instances spun up like App Engine Standard.
You can further improve the readiness of the App Engine Standard instances by later on implementing warm up requests.
Warmup requests load your app's code into a new instance before any live requests reach that instance. The last document talks about loading requests which is similar to cold starts in which it is the time when your app's code is being loaded to a newly created instance.
I hope you find this useful.

Related

Lowering Cloud Firestore API Latency

I developed an Android application where I use Firebase as my main service for storing data, authenticating users, storage, and more.
I recently went deeper into the service and wanted to see the API usage in my Google Cloud Platform.
In order to do so, I navigated to https://console.cloud.google.com/ to see what it has to show inside APIs and Services:
And by checking what might cause it I got:
Can someone please explain what is the meaning of "Latency" and what could be the reason that specifically this service has so much higher Latency value compared to the other API's?
Does this value have any impact on my application such as slowing the response or something else? If yes, are there any guidelines to lower this value?
Thank you
Latency is the "delay" until an operation starts. Cloud Functions, in particular, have to actually load and start a container (if they have paused), or at least load from memory (it depends on how often the function is called).
Can this affect your client? Holy heck, yes. but what you can do about it is a significant study in and of itself. For Cloud Functions, the biggest latency comes from starting the "container" (assuming cold-start, which your low Request count suggests) - it will have to load and initialize modules before calling your code. Same issue applies here as for browser code: tight code, minimal module loads, etc.
Some latency is to be expected from Cloud Functions (I'm pretty sure a couple hundred ms is typical). Design your client UX accordingly. Cloud Functions real power isn't instantaneous response; rather it's the compute power available IN PARALLEL with browser operations, and the ability to spin up multiple instances to respond to multiple browser sessions. Use it accordingly.
Listen and Write are long lived streams. In this case a 8 minute latency should be interpreted as a connection that was open for 8 minutes. Individual queries or write operations on those streams will be faster (milliseconds).

Understanding App Insights end to end for occassional long response times

Background: I have an ASP.NET Core App and have an API method that takes a file name of a blob that the frontend has uploaded to Azure Blob. It then needs to create a thumbnail version of the blob and return the name of the newly uploaded thumbnail Blob. Sometimes, for exactly the same file size it can take up to 40 seconds to complete. Mostly, it's around 400ms.
Below is the end to end from App Insights, I have a few things I don't understand:
1) The request duration is 37.5 s but yet the other operations add up to nowhere near this time
2) Why are there calls to master db? We are using EF6 with multiple contexts
3) The app is using an Azure App Service and SQL Azure. I don't understand why the response time is so inconsistent.
Any help would be much appreciated!
I've noticed multiple time that the first request after an application is deployed to Azure or after a long period that no requests were made to the application, it takes significantly longer to get a response.
As far as I remember it was related to start-up time of the site (if you're using an App Service on Windows based underlying VM it still uses IIS as a reverse proxy).
I solved the issue by configuring health checks that occasionally perform requests to the app.
Also, in addition to Application Insights (which logs information only after the application has started), you can try the tools listed here to see more information.
Hope it helps!
1.
The way the request timeline is displayed gives you only the time-span for the whole request (37.5s) and the individual time-spans for each dependency.
A dependency being another call that sends its run-time to the application insights.
In your example each call to the database is automatically tracked as a dependency. The code running after each database call is not though.
So e.g. requesting a database entry which takes 200ms and then issuing a Thread.Sleep of 2 seconds and requesting another database entry which takes 300ms would result in a 2 second gap between the two database-call dependencies which will each be listed with 200/300ms respectively.
You can use TelemetryClient.TrackDependency to wrap parts of your own code into its own dependency. This way you will see your own code as an entry on the request timeline.
2.
Depending on your EntityFramework database-initialisier EF will connect to the master db on context creation. (E.g. to create the database if it does not exist).
3.
Try tracking your own code to find out what parts of it are slow. EF has a few performance issues to consider, try to understand the performance caveats of the libs you use. If your calls are inconsistently slow it might be an issue with resources being over-utilized or caches being emptied too early (like for EF warm vs. cold queries).

Need Firebase Database behaviour clarification when inside a Service

I am testing a feature which requires a Firebase database write to happen at midnight everyday. Now it is possible that at this particular time, the client app might not be connected to the internet.
I have been using Firebase with persistence off as that can potentially cause issues of stale data in another feature of mine.
From my observation, if I disconnect the app before the write and keep it this way for a minute or so, Firebase eventually reconnects when I turn on the connectivity again and performs the write.
My main questions are:
Will this behaviour be consistent even if the connectivity is lost for quite a few hours?
Will Firebase timeout?
Since it is inside a forever running service, does it still need persistence to ensure that writes are not lost? (assume that the service does not restart).
If the service does restart, will the writes get lost?
I have some experience with this exact case, and I actually do NOT recommend the use of a background service for managing your Firebase requests. In fact, I wouldn't recommend managing Firebase requests at all (explained later).
Services, even though we can make them run forever, tend to get killed by the system quite a lot actually (unless you set their CPU priority to a higher level, but even then the system still might kill them).
If you call a Firebase Write call (of any kind), and your service gets killed, the write will get lost as you said. Unless, you create a sophisticated manager in which you store requests that haven't been committed into your internal storage, and load them up each time the service is restarted - but that is a very dirty work to do, considering the fact that Firebase Developers took care of us and made .setPersistenceEnabled(true) :)
I know, you mentioned you don't want to use it, but I STRONGLY advise you to do so. It works like charm, no services required, and you don't have to worry at all about managing your write requests. Perhaps it would be better to solve the other issue you have in order to make this possible.
To sum up, here's what I would do in your case:
I would call the .setPersistenceEnabled(true) someplace at the beginning (extending the Application class and calling it from onCreate() is recommended)
I would use Android's AlarmManager and register a BroadcastReceiver to receive an alarm at midnight (repetitive or not - you decide)
Inside the BroadcastReceiver, I'd simply call a write function of Firebase and worry about nothing :)
To make sure I covered all of your questions:
will this behaviour be consistent....
No. Case-scenario: Midnight time, your service has successfully received the call and is now trying to write into Firebase. If, for example, the user has no connection until 6 AM (just a case scenario), there is a very high chance that the system will kill it during those 6 hours, and your write will get lost. Flight Time, or staying in an area with no internet coverage - both are examples of risky scenarios that could break your app's consistency
Will Firebase Timeout?
It definitely could, as mentioned. I wouldn't take the risk and make a 80-90% working app. Use persistence and have a 100% working app :)
I believe I covered the rest of the questions..
Good luck!

Why does the first Firebase call from the server take much longer to return than subsequent calls?

Problem
First call to Firebase from a server takes ~15 - 20X longer than subsequent calls. While this is not a problem for a conventional server calling upon Firebase, it may cause issues with a server-less architecture leveraging Amazon Lambda/ Google Cloud Functions.
Questions
Why is the first call so much slower? Is it due to authentication?
Are there any workarounds?
Is it practical to do some user-initiated computation of data on Firebase DB using Amazon Lambda/ Google Cloud Functions and return the results to the client within 1 - 2 seconds?
Context
I am planning on using a server-less architecture with Firebase as the repository of my data and Amazon Lambda/ Cloud Functions augmenting Firebase with some server-side computation, e.g. searching for other users. I intend to trigger the functions via HTTP requests from my client.
One concern that I had was the large time taken by the first call to Firebase from the server. While testing some server-side code on my laptop, the first listener returns back in 6s! Subsequent calls return in 300 - 400ms. The dataset is very small (2 - 3 key value pairs) and I also tested by swapping the observers.
In comparison, a call to the Google Maps API from my laptop takes about 400ms to return.
I realise that response times would be considerably faster from a server. Still a factor of 15 - 20X on the first call is disconcerting.
TL;DR: You're noticing something that's known/expected, though we will shave the penalty down as GA approaches. Some improvements will come sooner than later.
Cloud Functions for Firebase team member here. We are able to offer Cloud Functions at a competitive price by "scaling to zero" (shutting down all instances) after sustained lack of load. When a request comes in and you have no available instances, Cloud Functions creates one for you on demand. This is obviously slower than hitting an active server and is something we call a "cold start". Cold starts are part of the reality of "serverless" architecture, but we have many people working on ways to reduce the penalty dramatically.
There's another case that I've recently started calling a "lukewarm" start. After a deploy, the Cloud Function instance has been created, but your application still has warmup work to do like establishing a connection to the Firebase Realtime Database. Part of this is authentication, as you've suggested. We have detected a slowdown here that will be fixed next week. After that, you'll still have to pay for SSL + Firebase handshakes. Try measuring this latency; it's not clear how much we'll be able to circumvent it.
Thanks Frank!! Read up on how firebase establishes web-socket connections.
To add to Frank's answer, the initial handshake causes the delay in the first pull. The approach drastically speeds up subsequent data pulls. While testing on an Amazon Lambda instance running on a US-west coast server. The response times were: 1) First pull: 1.6 - 2.3s 2) Subsequent pulls: 60 - 100ms. The dataset itself was extremely small, so one can assume that these time periods are simply for server-to-server communications. Takeaways:
Amazon Lambda instances can be triggered via an API gateway for non time-critical computations but is not the ideal solution for real-time computations on Firebase data such as returning search results (unless there is a way to guarantee persisting the handshake over instances - not from what I've read)
For time critical computations, I am going for running EC2/ GAE instances leveraging Firebase Queue. https://github.com/firebase/firebase-queue. The approach is more involved than firing lambda instances, but would return results faster (because of avoiding the handshake for every task).

Slow Transactions - WebTransaction taking the hit. What does this mean?

Trying to work out why some of my application servers have creeped up over 1s response times using newrelic. We're using WebApi 2.0 and MVC5.
As you can see below the bulk of the time is spent under 'WebTransaction'. The throughput figures aren't particularly high - what could be causing this, and what are the steps I can take to reduce it down?
Thanks
EDIT I added transactional tracing to this function to get some further analysis - see below:
Over 1 second waiting in System.Web.HttpApplication.BeginRequest().
Any insight into this would be appreciated.
Ok - I have now solved the issue.
Cause
One of my logging handlers which syncs it's data to cloud storage was initializing every time it was instantiated, which also involved a call to Azure table storage. As it was passed into the controller in question, every call to the API resulted in this instantiate.
It was a blocking call, so it added ~1s to every call. Once i configured this initialization to be server life-cycle wide,
Observations
As the blocking call was made at the time of the Controller being build (due to Unity resolving the dependancies at this point) New Relic reports this as
System.Web.HttpApplication.BeginRequest()
Although I would love to see this a little granular, as we can see from the transactional trace above it was in fact the 7 calls to table storage (still not quite sure why it was 7) that led me down this path.
Nice tool - my new relic subscription is starting to pay for itself.
It appears that the bulk of time is being spent in Account.NewSession. But it is difficult to say without drilling down into your data. If you need some more insight into a block of code, you may want to consider adding Custom Instrumentation
If you would like us to investigate this in more depth, please reach out to us at support.newrelic.com where we will have you account information on hand.

Resources