I deployed a GCP HTTP triggered Cloud Function that queries an Entity from GCP Datastore by the key. Here is the code:
const datastore = require('#google-cloud/datastore');
exports.helloWorld = function helloWorld(req, res) {
const client = datastore();
const key = client.key(['Person', 'harry']);
client.get(key, function(err, entity) {
res.status(200).send(JSON.stringify(entity));
});
};
According to the logs this function takes ~1.6 seconds to complete when invoked. Repeat invocations are not any faster.
Removing the query and responding to the HTTP request takes ~0.5 seconds to complete, so it seems the query takes ~1.1 seconds to complete. For me this is unusably slow and it seems unlikely that this is the intended performance of GCP Datastore.
I thought that perhaps the DB and the Function are running in different regions but I am unable to check, the instructions given in the documentation is incorrect (https://cloud.google.com/datastore/docs/locations#location-r). The region is not displayed on the page for me.
What might be the problem with my setup here? I was expecting ~50ms for simple queries rather than ~1100ms.
In this question, you can find attached screenshot of Google's stack trace and see that the best time with Datastore and GCP is about 100 ms.
Honestly speaking, we worked on GCF with Datastore more than 3 months, and this time usually worth than 100 ms, about 200 - 400 ms per call. I had conversation with GCP support and can confirm, that currently they have troubles with that, something with requests routing and optimizations between GCF and Datastore only. I've collected several performance tests data sets through Yandex-Tank, and average requests latency where about 800ms to 7 seconds (about 4-5 Datastore serial requests).
After 3 months of development we moved to App Engine and found that Datastore behaves way faster in this environment, and average time is about 20-30 ms (4-5x time faster) per Datastore request.
I also noticed, that Datastore lookup time almost not depends on the amount of data it's running through. Is it 1 record or 1000 records, time will be almost same 20-30 ms. I believe that this time even better if we can look on the Datstore itself without any network communications extras.
Now we are adding redis as caching service to speed up all the requests. I guess it might work with GCF and Datastore as well, but I would not expect this would be a guaranteed solution.
Therefore, consider use GCF as utility processing units instead of primary processing endpoint.
Related
I have setup the firebase local emulator and created a project with cloud functions and firestore. I also exported my production data to the project which I import into the emulator. (the collection in question is about 5000 documents ranging from 5kb to 200kb in size)
My goal was to benchmark query performance, so I wrote a query and ran it a number of times to get an average execution time of 130 ms. I then wrote a different query to get an average execution time of 20 ms. I did not import any indexes (the admin sdk doesn't seem to require them when querying the emulator like it does what querying production).
I also observed the first query always takes significantly longer.
My question is basically, how does this difference in execution time translation to the production environment if at all. Assuming the same queries are run against the same data, and ignoring network latency to/from the client. Will the second query run about ~110ms faster? Or will the difference be less/more?
Also why does the first query take longer, and is there any way to use that fact to improve performance in real world usage?
how does this difference in execution time translation to the production environment if at all.
The observed performance of the emulator has little to nothing to do with the performance of the actual cloud hosted product. It's not the same code, and it's not running on the same set of computing resources.
Firestore is massively scalable and shards your data across many computing resources, all of which work together to service a query and ensure that it performs at any scale. As you can imagine, an emulator running on your one local machine is nowhere near that. They are simply not comparable.
The emulator is meant to ease local development without requiring the use of paid cloud resources to get your job done. It's not meant for any other purpose.
I am developing a new React web site using Firebase hosting and firebase functions.
I am using a MySQL database (SQL required for heavy data reporting) in GCP Cloud Sql and GCP Secret Manager to house the database username/password.
The Firebase functions are used to pull data from the database and send the results back to the React app.
In my local emulator everything works correctly and its responsive.
When its deployed to Firebase Im noticing the 1st and sometimes the 2nd request to a function takes about 6 seconds to respond. After that they respond less than 1 sec. For the slow responses I can see in the logs the database pool is initialized.
So the slow responses are the first hit to the instance. Im assuming in my case two instances are being created.
Note that the functions that do not require a database respond quickly regardless of it being the 1st or 2nd call.
After about 15 minutes of not using a service I have the same problem. Im assuming the instances are being reclaimed and a new instance is being created.
The problem is each function will have its own independent db pool so each function will initially provide a slow response (maybe twice for the second call).
The site will see low traffic meaning most users will experience this slow response.
By removing the reference to Secret Manager and hard coding username/password the response has dropped to less than 3 seconds. But this is still not acceptable.
Is there a way to:
Increase the time that a function is reclaimed if not used?
Tag an instance that it should not be reclaimed?
Is there a way to create a global db pool that does not get shutdown between recycles?
Is there an approach to work with db connections in Firebase Functions to avoid reinit of the db pool?
Is this the nature of functions and Im limited to this behavior?
Since I am in early development, would moving to AppEngine/Node.js (the Flexible Plan) resolve recycling issues?
First of all, the issues you have been experiencing with the 1st and the 2nd requests taking the longest time are called cold starts.
This totally makes sense because new instances are spun up. You may have a cold start when:
Your function has been deployed but not yet triggered.
Your function has been idle(not processing requests) enough that it has been recycled for resources.
Your function is auto-scaling to handle capacity and creating new instances.
I understand that your five questions are intended to work around the issue of Cloud Functions recycling the instances.
The straight answer from questions 1 to 4 is No because Cloud Functions implement the serverless paradigm.
This means that one function invocation should not rely on in-memory state(database pool) set by a previous invocation.
Now this does not mean that you cannot improve the cold start boot time.
Generally the number one contributor of cold start boot time is the number of dependencies.
This video from the Google Cloud Tech channel exactly goes over the issue you have been experiencing and describes in more detail the practices implemented to tune up Cloud Functions.
If after going through the best practices from the video, your coldstart shows unacceptable values, then, as you have already suggested, you would need to use a product that allows you to have a minimum set of instances spun up like App Engine Standard.
You can further improve the readiness of the App Engine Standard instances by later on implementing warm up requests.
Warmup requests load your app's code into a new instance before any live requests reach that instance. The last document talks about loading requests which is similar to cold starts in which it is the time when your app's code is being loaded to a newly created instance.
I hope you find this useful.
I developed an Android application where I use Firebase as my main service for storing data, authenticating users, storage, and more.
I recently went deeper into the service and wanted to see the API usage in my Google Cloud Platform.
In order to do so, I navigated to https://console.cloud.google.com/ to see what it has to show inside APIs and Services:
And by checking what might cause it I got:
Can someone please explain what is the meaning of "Latency" and what could be the reason that specifically this service has so much higher Latency value compared to the other API's?
Does this value have any impact on my application such as slowing the response or something else? If yes, are there any guidelines to lower this value?
Thank you
Latency is the "delay" until an operation starts. Cloud Functions, in particular, have to actually load and start a container (if they have paused), or at least load from memory (it depends on how often the function is called).
Can this affect your client? Holy heck, yes. but what you can do about it is a significant study in and of itself. For Cloud Functions, the biggest latency comes from starting the "container" (assuming cold-start, which your low Request count suggests) - it will have to load and initialize modules before calling your code. Same issue applies here as for browser code: tight code, minimal module loads, etc.
Some latency is to be expected from Cloud Functions (I'm pretty sure a couple hundred ms is typical). Design your client UX accordingly. Cloud Functions real power isn't instantaneous response; rather it's the compute power available IN PARALLEL with browser operations, and the ability to spin up multiple instances to respond to multiple browser sessions. Use it accordingly.
Listen and Write are long lived streams. In this case a 8 minute latency should be interpreted as a connection that was open for 8 minutes. Individual queries or write operations on those streams will be faster (milliseconds).
I have a Firebase realtime database structure that looks like this:
rooms/
room_id/
user_1/
name: 'hey',
connected: true
connected is basically a Boolean indicating as to whether the user is connected or not and will be set to false using onDisconnect() that Firebase provides.
Now my question is - If I trigger a cloud function every time theconnected property of a user changes , can I run a setTimeout() for 45 seconds . If the connected property is still false, at the end of the setTimeout() (for which I read that particular connected value from the db ) then I delete the node of the user (like the user_1 node above).
Will ths setTimeout() pose a problem if there are many triggers fired simultaneously?
In short, Cloud Functions have a maximum time they can run.
If your timeout makes it's callback after that time limit expired, the function will already have been terminated.
Consider using a very simple and efficient way for calling scheduled code in Cloud Functions called a cron-job.
Ref
If you use setTimeout() to delay the execution of a function for 45 seconds, that's probably not enough time to cause a problem. The default timeout for a function is 1 minute, and if you exceed that, the function will be forced to terminate immediately. If you are concerned about this, you should simply increase the timeout.
Bear in mind that you are paying for the entire time that a function executes, so even if you pause the function, you will be billed for that time. If you want a more efficient way of delaying some work for later, consider using Cloud Tasks to schedule that.
For my understanding your functionality is intend to monitoring the users that connected and is on going connect to the Firebase mealtime database from the cloud function. Is that correct?
For monitoring the Firebase Realtime database, GCP provided the tool to monitor the DB performance and usage.
Or simply you just want to keep the connection a live ?
if the request to the Firebase Realtime DB is RESTful requests like GET and PUT, it only keep the connection per request, but is is High requests, it still cost more.
Normally, we suggest the client to use the native SDKs for your app's platform, instead of the REST API. The SDKs maintain open connections, reducing the SSL encryption costs and database load that can add up with the REST API.
However, If you do use the REST API, consider using an HTTP keep-alive to maintain an open connection or use server-sent events and set keep-alive, which can reduce costs from SSL handshakes.
Problem
First call to Firebase from a server takes ~15 - 20X longer than subsequent calls. While this is not a problem for a conventional server calling upon Firebase, it may cause issues with a server-less architecture leveraging Amazon Lambda/ Google Cloud Functions.
Questions
Why is the first call so much slower? Is it due to authentication?
Are there any workarounds?
Is it practical to do some user-initiated computation of data on Firebase DB using Amazon Lambda/ Google Cloud Functions and return the results to the client within 1 - 2 seconds?
Context
I am planning on using a server-less architecture with Firebase as the repository of my data and Amazon Lambda/ Cloud Functions augmenting Firebase with some server-side computation, e.g. searching for other users. I intend to trigger the functions via HTTP requests from my client.
One concern that I had was the large time taken by the first call to Firebase from the server. While testing some server-side code on my laptop, the first listener returns back in 6s! Subsequent calls return in 300 - 400ms. The dataset is very small (2 - 3 key value pairs) and I also tested by swapping the observers.
In comparison, a call to the Google Maps API from my laptop takes about 400ms to return.
I realise that response times would be considerably faster from a server. Still a factor of 15 - 20X on the first call is disconcerting.
TL;DR: You're noticing something that's known/expected, though we will shave the penalty down as GA approaches. Some improvements will come sooner than later.
Cloud Functions for Firebase team member here. We are able to offer Cloud Functions at a competitive price by "scaling to zero" (shutting down all instances) after sustained lack of load. When a request comes in and you have no available instances, Cloud Functions creates one for you on demand. This is obviously slower than hitting an active server and is something we call a "cold start". Cold starts are part of the reality of "serverless" architecture, but we have many people working on ways to reduce the penalty dramatically.
There's another case that I've recently started calling a "lukewarm" start. After a deploy, the Cloud Function instance has been created, but your application still has warmup work to do like establishing a connection to the Firebase Realtime Database. Part of this is authentication, as you've suggested. We have detected a slowdown here that will be fixed next week. After that, you'll still have to pay for SSL + Firebase handshakes. Try measuring this latency; it's not clear how much we'll be able to circumvent it.
Thanks Frank!! Read up on how firebase establishes web-socket connections.
To add to Frank's answer, the initial handshake causes the delay in the first pull. The approach drastically speeds up subsequent data pulls. While testing on an Amazon Lambda instance running on a US-west coast server. The response times were: 1) First pull: 1.6 - 2.3s 2) Subsequent pulls: 60 - 100ms. The dataset itself was extremely small, so one can assume that these time periods are simply for server-to-server communications. Takeaways:
Amazon Lambda instances can be triggered via an API gateway for non time-critical computations but is not the ideal solution for real-time computations on Firebase data such as returning search results (unless there is a way to guarantee persisting the handshake over instances - not from what I've read)
For time critical computations, I am going for running EC2/ GAE instances leveraging Firebase Queue. https://github.com/firebase/firebase-queue. The approach is more involved than firing lambda instances, but would return results faster (because of avoiding the handshake for every task).