How to force test a Firestore parallel Transaction? - firebase

Transactions are used for atomic changes and when two clients may change the same data at the same time.
I want to test in the dev env if my transaction is having its expected behavior when there is a parallel transaction running from multiple clients requests. It runs only in my Cloud Functions. I can't let any undesired behavior of this nature to happen in the prod env, so I want to check in dev if everything is alright when it happens, even being unlikely.
Is it possible to force this test case?
Using JS/TS.

In the case of a concurrent edit, Cloud Firestore runs the entire transaction again. For example, if a transaction reads documents and another client modifies any of those documents, Cloud Firestore retries the transaction. This feature ensures that the transaction runs on up-to-date and consistent data. Refer to this documentation regarding updating data with transactions.
You can check this post that discusses concurrent read and write at the same time. Another link that also has an example on how to create and run a transaction using Node.js.
Lastly, you can consider creating two FirebaseApp instances, and running the same transaction in both and then synchronize between the two in the single process that they run in. Or, use testing tools that support parallel tests, like Node TAP.

Related

Callback to cloud functions/cloud run if timeout happens?

I have some cloud run and cloud functions that serve to parse a large number of files that users upload. Sometimes users upload an exceedingly large number of files, and that causes these functions to timeout even when I set them to their maximum runtime limits (15 minutes for Cloud Run and 9 minutes for Cloud Functions respectively.) I have a loading icon corresponding to a database entry that shows the progress of processing each batch of files that's been uploaded, and so if the function times out currently, the loading icon gets stuck for that batch in perpetuity, as the database is not updated after the function is killed.
Is there a way for me to create say a callback function to the Cloud Run/Functions to update the database and indicate that the parsing process failed if the Cloud Run/Functions timed out? There is currently no way for me to know a priori if the batch of files is too large to process, and clearly I cannot use a simple try/catch here as the execution environment itself will be killed.
One popular method is to have a public-facing API location that you can invoke by passing on the remaining queued information. You should assume that this API location is compromised so some sort of OTP should be used. This does depend on some factors, such as how these files are uploaded or the cloud trigger was handled which may require you to store that information in a database location to be retrieved.
You can set a flag on the db before you start processing, then after processing, clear/delete the flag. Then have another function regularly check for the status.
No such callback functionality exists for either product.
Serverless products are generally not meant to be used for batch processing where the batches can easily be larger than the limits of the system. They are meant for small bits of discrete work, such as simple API calls.
If you want to process larger amounts of data, considering first uploading that to Cloud Storage (which will accept files of any size), then sending a pubsub message upon completion to a backend compute product that can handle the processing requirements (such as Compute Engine).
Direct answer. For example, you might be able to achieve that by filtering and creating a sink in the relevant StackDriver logs (where a cloud function timeout crash is to be recorded), so that the relevant log records are pushed into some PubSub topic. On the other side of that topic you may have some other cloud function, which can implement the desired functionality.
Indirect answer. Without context, scope and requirement details - it is difficult to provide a good suggestion... but, based on some guesses - I am not sure that the design is optimal. Serverless services are supposed to be used for handling independent and relatively small chunks of data. If are have something large - you might like to use the first, let's say cloud function, to divide it into reasonably small chunks, so they can be processed independently by, let's say the second cloud function. In your case - can you have a cloud function per file, for example? If a file is too large (a few Gb, or dozen Gb) - can it be saved to a cloud storage and read/processed in chunks, so that the cloud functions are triggered from he cloud storage? And so on. That approach should help, but has a drawback - complexity is increased, as you have to coordinate and control how the process is going...

Firebase Persistent database on first installation

My current application developed in Unity uses Firebase Realtime Database with database persistence enabled. This works great for offline use of the application (such as in areas of no signal).
However, if a new user runs the application for the first time without an internet connection - the application freezes. I am guessing, it's because it needs to pull down the database for the first time in order for persistence to work.
I am aware of threads such as: 'Detect if Firebase connection is lost/regained' that talk about handling database disconnection from Firebase.
However, is there anyway I can check to see if it is the users first time using the application (eg via presence of the persistent database?). I can then inform them they must go online for 'first time setup'?
In addition to #frank-van-puffelen's answer, I do not believe that Firebase RTDB should itself cause your game to lock up until a network connection occurs. If your game is playable on first launch without a network connect (ie: your logic itself doesn't require some initial state from the network), you may want to check the following issues:
Make sure you can handle null. If your game logic is in a Coroutine, Unity may decide to silently stop it rather than fully failing out.
If you're interacting with the database via Transactions, generally assume that it will run twice (once against your local cache then again when the cache is synced with the server if the value is different). This means that the first time you perform a change via a transaction, you'll likely have a null previous state.
If you can, prefer to listen to ValueChanged over GetValueAsync. You'll always get this callback on your main Unity thread, you'll always get the callback once on registration with the data in your local cache, and the data will be periodically updated as the server updates. Further, if you see #frank-van-puffelen answer elsewhere, if you're using GetValueAsync you may not get the data you expect (including a null if the user is offline). If your game is frozen because it's waiting on a ContinueWithOnMainThread (always prefer this to ContinueWith in Unity unless you have a reason not to) or an await statement, this could ValueChanged may work around this as well (I don't think this should be the case).
Double check your object lifetimes. There are a ton of reasons that an application may freeze, but when dealing with asynchronous logic definitely make sure you're aware of the differences between Unity's GameObject lifecycle and C#'s typical object lifecycle (see this post and my own on interacting with asynchronous logic with Unity and Firebase). If an objects OnDestroy is invoked before await, ContinueWith[OnMainThread], or ValueChanged is invoked, you're in danger of running into null references in your own code. This can happen if a scene changes, the frame after Destroy is called, or immediately following a DestroyImmediate.
Finally, many Firebase functions have an Async and synchronous variant (ex: CheckDependencies and CheckDependenciesAsync). I don't think there are any to call out for Realtime Database proper, but if you use the non async variant of a function (or if you spinlock on the task completing, including forgetting to yield in a coroutine), the game will definitely freeze for a bit. Remember that any cloud product is i/o bound by nature, and will typically run slower than your game's update loop (although Firebase does its best to be as fast as possible).
I hope this helps!
--Patrick
There is nothing in the Firebase Database API to detect whether its offline cache was populated.
But you can detect when you make a connection to the database, for example by listening to the .info/connected node. And then when that first is set to true, you can set a local flag in the local storage, for example in PlayerPrefs.
With this code in place, you can then detect if the flag is set in the PlayerPrefs, and if not, show a message to the user that they need to have a network connection for you to download the initial data.

Can I use wildcards when deleting Google Cloud Tasks?

I'm very new to Google Cloud Tasks.
I'm wondering, is there a way to use wildcards when deleting a task? For example, if I potentially had 3 tasks in queue using the following ID naming structure...
id-123-task-1
id-123-task-2
id-123-task-3
Could I simply delete id-123-task-* to delete all 3, or would I have to delete all 3 specific ID's every time? I guess I'm trying to limit the number of required API invocations to delete everything related to 'id-123'.
Can I use wildcards when deleting Google Cloud Tasks?
As of today, wildcards are not supported within Google Cloud Tasks. I can not confirm that you could pass the Google Cloud Task's ID as you mentioned id-123-task-* will delete all the tasks.
Nonetheless, if you are creating tasks for an specific purpose in mind, you could create a separate queue for this kind of tasks.
Not only you will win in terms of organizing your tasks, but when you would like to delete all, you will only need to purge all tasks from the specified queue making only 1 API invocation.
Here you could see how to purge all tasks from the specified queue, and also how to delete tasks and queues.
Also, I attached the API documentation in case you need further information about purging queues in Cloud Tasks.
As stated here, take into account that if you purge all the tasks from a queue:
Do not create new tasks immediately after purging a queue. Wait at least a second. Tasks created in close temporal proximity to a purge call will also be purged.
Also, if you are using named tasks, as stated here:
You can assign your own name to a task by using the name parameter. However, this introduces significant performance overhead, resulting in increased latencies and potentially increased error rates associated with named tasks. These costs can be magnified significantly if tasks are named sequentially, such as with timestamps.
As a consequence, if you are using named tasks, the documentation recommends using a well-distributed prefix for task names, such as a hash of the contents.
I think this is the best solution if you would like to limit the amount of API calls.
I hope it helps.

How to invoke other Cloud Firebase Functions from a Cloud Function

Let's say I have a Cloud Firebase Function - called by a cron job - that produces 30+ tasks every time it's invoked.
These tasks are quite slow (5 - 6 second each in average) and I can't process them directly in the original because it would time out.
So, the solution would be invoking another "worker" function, once per task, to complete the tasks independently and write the results in a database. So far I can think of three strategies:
Pubsub messages. That would be amazing, but it seems that you can only listen on pubsub messages from within a Cloud Function, not create one. Resorting to external solutions, like having a GAE instance, is not an option for me.
Call the worker http-triggered Firebase Cloud Function from the first one. That won't work, I think, because I would need to wait for a response from the all the invoked worker functions, after they finish and send, and my original Function would time out.
Append tasks to a real time database list, then have a worker function triggered by each database change. The worker has to delete the task from the queue afterwards. That would probably work, but it feels there are a lot of moving parts for a simple problem. For example, what if the worker throws? Another cron to "clean" the db would be needed etc.
Another solution that comes to mind is firebase-queue, but its README explicitly states:
"There may continue to be specific use-cases for firebase-queue,
however if you're looking for a general purpose, scalable queueing
system for Firebase then it is likely that building on top of Google
Cloud Functions for Firebase is the ideal route"
It's not officially supported and they're practically saying that we should use Functions instead (which is what I'm trying to do). I'm a bit nervous on using in prod a library that might be abandoned tomorrow (if it's not already) and would like to avoid going down that route.
Sending Pub/Sub messages from Cloud Functions
Cloud Functions are run in a fairly standard Node.js environment. Given the breadth of the Node/NPM ecosystem, the amount of things you can do in Cloud Functions is quite broad.
it seems that you can only listen on pubsub messages from within a Cloud Function, not create one
You can publish new messages to Pub/Sub topics from within Cloud Functions using the regular Node.js module for Pub/Sub. See the Cloud Pub/Sub documentation for an example.
Triggering new actions from Cloud Functions through Database writes
This is also a fairly common pattern. I usually have my subprocesses/workers clean up after themselves at the same moment they write their result back to the database. This works fine in my simple scenarios, but your mileage may of course vary.
If you're having a concrete cleanup problem, post the code that reproduces the problem and we can have a look at ways to make it more robust.

How can I keep objects stored in firebase cloud function RAM?

My application needs to build a couple of large hashmaps before processing a user's request. Ideally I want to store these hashmaps in-memory on the machine, which means it never has to do any expensive processing and can process any incoming requests quickly.
But this doesn't work for firebase because there's a chance a user triggers a new instance which sets off the very time-consuming preprocessing step.
So, I tried designing my application to use the firebase database, and get only the data it needs from the database each time instead of holding all the data in-memory. But, since the cloud functions are downloading loads of data from the database, I have now triggered over 1.7 GB in download for this month, just by myself from testing. This goes over the quota.
There must be something I'm missing; all I want is a permanent memory storage of some hashmaps. All I want is for those hashmaps to be ready by the time the function is called with a request. It seems like such a simple requirement; how come there is no way to do this?
If you want to store data in the container that runs your Cloud Functions, you can use its local tempfs, which is actually kept in memory. But this will disappear when the container is recycled, which happens when your function hasn't been access for a while. So this local file system will have to be rebuilt whenever the container spins up.
If you want permanent storage of values you generate, consider using Google Cloud Storage. It is probably a more cost effective option, and definitively the most scalable one.

Resources