On first invocation of the Firebase function I create a mysql connection pool (which is expensive) and store it in global scope. This works fine as long as there is a single instance serving requests. Assuming multiple functions instances can exist under load, what's a good practice to prevent numerous such pools from getting created?
There is a specific section in the Cloud Functions documentation: "Use global variables to reuse objects in future invocations".
There is no guarantee that the state of a Cloud Function will be preserved for future invocations. However, Cloud Functions often recycles the execution environment of a previous invocation. If you declare a variable in global scope, its value can be reused in subsequent invocations without having to be recomputed.
This way you can cache objects that may be expensive to recreate on
each function invocation.
Cloud functions scale up to 1,000 instances per region.
SQL connections scale up to 32,000+.
Practically speaking, you're not going to run into upper limits on your db connection(s).
Related
Transactions are used for atomic changes and when two clients may change the same data at the same time.
I want to test in the dev env if my transaction is having its expected behavior when there is a parallel transaction running from multiple clients requests. It runs only in my Cloud Functions. I can't let any undesired behavior of this nature to happen in the prod env, so I want to check in dev if everything is alright when it happens, even being unlikely.
Is it possible to force this test case?
Using JS/TS.
In the case of a concurrent edit, Cloud Firestore runs the entire transaction again. For example, if a transaction reads documents and another client modifies any of those documents, Cloud Firestore retries the transaction. This feature ensures that the transaction runs on up-to-date and consistent data. Refer to this documentation regarding updating data with transactions.
You can check this post that discusses concurrent read and write at the same time. Another link that also has an example on how to create and run a transaction using Node.js.
Lastly, you can consider creating two FirebaseApp instances, and running the same transaction in both and then synchronize between the two in the single process that they run in. Or, use testing tools that support parallel tests, like Node TAP.
I have some cloud run and cloud functions that serve to parse a large number of files that users upload. Sometimes users upload an exceedingly large number of files, and that causes these functions to timeout even when I set them to their maximum runtime limits (15 minutes for Cloud Run and 9 minutes for Cloud Functions respectively.) I have a loading icon corresponding to a database entry that shows the progress of processing each batch of files that's been uploaded, and so if the function times out currently, the loading icon gets stuck for that batch in perpetuity, as the database is not updated after the function is killed.
Is there a way for me to create say a callback function to the Cloud Run/Functions to update the database and indicate that the parsing process failed if the Cloud Run/Functions timed out? There is currently no way for me to know a priori if the batch of files is too large to process, and clearly I cannot use a simple try/catch here as the execution environment itself will be killed.
One popular method is to have a public-facing API location that you can invoke by passing on the remaining queued information. You should assume that this API location is compromised so some sort of OTP should be used. This does depend on some factors, such as how these files are uploaded or the cloud trigger was handled which may require you to store that information in a database location to be retrieved.
You can set a flag on the db before you start processing, then after processing, clear/delete the flag. Then have another function regularly check for the status.
No such callback functionality exists for either product.
Serverless products are generally not meant to be used for batch processing where the batches can easily be larger than the limits of the system. They are meant for small bits of discrete work, such as simple API calls.
If you want to process larger amounts of data, considering first uploading that to Cloud Storage (which will accept files of any size), then sending a pubsub message upon completion to a backend compute product that can handle the processing requirements (such as Compute Engine).
Direct answer. For example, you might be able to achieve that by filtering and creating a sink in the relevant StackDriver logs (where a cloud function timeout crash is to be recorded), so that the relevant log records are pushed into some PubSub topic. On the other side of that topic you may have some other cloud function, which can implement the desired functionality.
Indirect answer. Without context, scope and requirement details - it is difficult to provide a good suggestion... but, based on some guesses - I am not sure that the design is optimal. Serverless services are supposed to be used for handling independent and relatively small chunks of data. If are have something large - you might like to use the first, let's say cloud function, to divide it into reasonably small chunks, so they can be processed independently by, let's say the second cloud function. In your case - can you have a cloud function per file, for example? If a file is too large (a few Gb, or dozen Gb) - can it be saved to a cloud storage and read/processed in chunks, so that the cloud functions are triggered from he cloud storage? And so on. That approach should help, but has a drawback - complexity is increased, as you have to coordinate and control how the process is going...
I'm interested in using firebase functions.
I can't find any reference about execution speed of firebase functions.
So the question is: If I write a function (which is independent of network and outer resources), will it take nearly the same execution time every time I execute it? Is the execution speed consistent?
Each instance that Cloud Functions spins up to run your code will be an instance of the same container on the same hardware. Since there are no parallel executions of functions on the same instance, the functions all have access to the complete resources of their instance while they run.
The only way to change the performance is by changing the memory that is available in the container (which in turn also changes the CPU), but this is a configuration that you control and it applies to all instances of your function that start after you change it. For an overview of these instance types, see the table on the Cloud Functions pricing page.
And as Doug pointed out, if Cloud Functions needs to provision a new container for a function invocation there will be a delay that is known as a cold-start while the container is being set up.
I want to cache most recent records (say last 24 hrs) in a http firebase function.
In a http firebase function (say fetchLastXRecords), I look for the record in cache (defined Global variable to store the records) if not found fetch from database and set cache.
Problem arises when I want to update any record of cache because this Global variable is not accessible by other firebase functions (could be real time database change triggers).
What could be a good approach to update the records in cache? May be I can invoke the caching http firebase function and pass the updated records? or I can store the updated records in database and later caching function look in database and update the cache records?
In Cloud Functions, you have no ability to ensure that there is a global variable in your code that's accessible by your functions. There are two things you need to know about how Cloud Functions works:
Under load, multiple server instances will be allocated to run your functions. These server instances don't share any state.
Each of your functions is deployed to different server instances. Two functions will never run on the same server instance.
As a result, if you have any values to share between functions, you should use a persistence mechanism, such as a database. When your functions need to read and write a shared value, they should access the database. Also, they should use some sort of atomic transaction to make sure that multiple concurrent reads and writes are safe.
My application needs to build a couple of large hashmaps before processing a user's request. Ideally I want to store these hashmaps in-memory on the machine, which means it never has to do any expensive processing and can process any incoming requests quickly.
But this doesn't work for firebase because there's a chance a user triggers a new instance which sets off the very time-consuming preprocessing step.
So, I tried designing my application to use the firebase database, and get only the data it needs from the database each time instead of holding all the data in-memory. But, since the cloud functions are downloading loads of data from the database, I have now triggered over 1.7 GB in download for this month, just by myself from testing. This goes over the quota.
There must be something I'm missing; all I want is a permanent memory storage of some hashmaps. All I want is for those hashmaps to be ready by the time the function is called with a request. It seems like such a simple requirement; how come there is no way to do this?
If you want to store data in the container that runs your Cloud Functions, you can use its local tempfs, which is actually kept in memory. But this will disappear when the container is recycled, which happens when your function hasn't been access for a while. So this local file system will have to be rebuilt whenever the container spins up.
If you want permanent storage of values you generate, consider using Google Cloud Storage. It is probably a more cost effective option, and definitively the most scalable one.