How to minimise Firebase Function Latency - firebase

As per the documentation, Firebase Functions are currently supported for 4 regions only - “us-central1”, “us-east1", “europe-west1”, “asia-northeast1"
That means locations further away would incur more latency, and often that translates to lower performance.
How can this limitation be worked around?

1) Choosing a location that is closest to you. You can set up test cloud functions in different regions, and test the round-trip latency. Only you can discover the specifics about your location.
2) Focus your software architecture on infrastructure that is locally available.
Use the client-side Firestore library directly as much as possible. It supports offline data, queueing data to send out later if you don't have internet, and caching read data locally - you can't get faster latency than that! So make sure you use Firestore for CRUD operations.
3) Architect to use CloudFunctions for batch and background processesing. If any business-logic processing is required, write the data to Firestore (using client libraries), and have a FF trigger to do some processing upon the write data-event. Have that trigger update that record with the additional processing, and state. I believe that if you're using the client-side libraries there is a way to have the updated data automatically pushed back to the client-side. (edited)
You also have the bonus benefit of being able to control authorisation with Firestore Auth, where Functions don't have an admin-level authorisation control.
4) Reduce chatter - minimising the amount of CloudFunction calls overall, and ensuring your CloudFunctions themselves do more in one go and return more complete data in one go.

Related

Does it make sense to use Firestore Realtime as a way to interact with functions?

I have a good old fashioned piece of code that takes some inputs, runs some code and produces outputs for a variety of tasks. This code runs in a function, and it should be triggered by a UI action. It is relevant to store a history of these runs per user, to list them in a specific view of the UI, and, on returning users, put them back where they left.
Since the Firestore documentation and videos strongly encourage us to default to it for data exchange with the backend, I of course decided to entertain the idea before just moving on to simply implementing a callable.
First, it made sense to me to model the data as follows:
/users
id
/taskRuns
id
inputs
outputs
/history
id
date
...outputs
and make the data flow as follows:
The UI writes to the inputs field and it listens to the document.
The unction is triggered. It runs happily and it writes to outputs.
A second function listening to /assets puts copies over the inputs and outputs onto its corresponding /history subcollection.
The UI reacts to the change.
At this point, a couple of red flags are raised on my mind:
Since the UI has to both read and write /assets, a malicious user could technically trigger infinite loops here. I'm aware that the same thing is possible with a callable, but it somehow seems even easier in this scenario.
I lose the ability to do timeout and error handling the way one would with a simple API call.
However, I also understand that:
My function needs to write to the database anyway to store the outputs even if it was returning it à la API.
Logic for who's responsible for what has somewhat been broken down into pieces.
So - Am I looking at it wrong? Is there a world in which this approach would make sense over the callable?
The approach you're describing uses the database as a queue/cache/proxy for your backend functionality. I regularly use it myself with either Realtime Database or Firestore to expose backend functionality, because it:
Works more gracefully when there's intermittent loss of internet connection. Where a direct call to Cloud Functions would have to implement its own retries, the database SDKs already handle this scenario.
Uses the database as a proxy/cache for the result, so that repeated loads of the data don't cause additional calls to my Cloud Functions.
By setting a maximum number of instances on my Cloud Function, I can prevent overloading any legacy infrastructure my code calls, and the database becomes my queue of pending requests.
The pattern is also used in this video from Cloud Next 2019: Serverless in real life: a case study in the travel industry and this talk from Google I/O: Architecting for Data Contention in a Realtime World with Firebase.
That said, many of my team mates only ever use callable functions, so it's definitely a matter of preference and the requirements of your use-case.

Callback to cloud functions/cloud run if timeout happens?

I have some cloud run and cloud functions that serve to parse a large number of files that users upload. Sometimes users upload an exceedingly large number of files, and that causes these functions to timeout even when I set them to their maximum runtime limits (15 minutes for Cloud Run and 9 minutes for Cloud Functions respectively.) I have a loading icon corresponding to a database entry that shows the progress of processing each batch of files that's been uploaded, and so if the function times out currently, the loading icon gets stuck for that batch in perpetuity, as the database is not updated after the function is killed.
Is there a way for me to create say a callback function to the Cloud Run/Functions to update the database and indicate that the parsing process failed if the Cloud Run/Functions timed out? There is currently no way for me to know a priori if the batch of files is too large to process, and clearly I cannot use a simple try/catch here as the execution environment itself will be killed.
One popular method is to have a public-facing API location that you can invoke by passing on the remaining queued information. You should assume that this API location is compromised so some sort of OTP should be used. This does depend on some factors, such as how these files are uploaded or the cloud trigger was handled which may require you to store that information in a database location to be retrieved.
You can set a flag on the db before you start processing, then after processing, clear/delete the flag. Then have another function regularly check for the status.
No such callback functionality exists for either product.
Serverless products are generally not meant to be used for batch processing where the batches can easily be larger than the limits of the system. They are meant for small bits of discrete work, such as simple API calls.
If you want to process larger amounts of data, considering first uploading that to Cloud Storage (which will accept files of any size), then sending a pubsub message upon completion to a backend compute product that can handle the processing requirements (such as Compute Engine).
Direct answer. For example, you might be able to achieve that by filtering and creating a sink in the relevant StackDriver logs (where a cloud function timeout crash is to be recorded), so that the relevant log records are pushed into some PubSub topic. On the other side of that topic you may have some other cloud function, which can implement the desired functionality.
Indirect answer. Without context, scope and requirement details - it is difficult to provide a good suggestion... but, based on some guesses - I am not sure that the design is optimal. Serverless services are supposed to be used for handling independent and relatively small chunks of data. If are have something large - you might like to use the first, let's say cloud function, to divide it into reasonably small chunks, so they can be processed independently by, let's say the second cloud function. In your case - can you have a cloud function per file, for example? If a file is too large (a few Gb, or dozen Gb) - can it be saved to a cloud storage and read/processed in chunks, so that the cloud functions are triggered from he cloud storage? And so on. That approach should help, but has a drawback - complexity is increased, as you have to coordinate and control how the process is going...

Lowering Cloud Firestore API Latency

I developed an Android application where I use Firebase as my main service for storing data, authenticating users, storage, and more.
I recently went deeper into the service and wanted to see the API usage in my Google Cloud Platform.
In order to do so, I navigated to https://console.cloud.google.com/ to see what it has to show inside APIs and Services:
And by checking what might cause it I got:
Can someone please explain what is the meaning of "Latency" and what could be the reason that specifically this service has so much higher Latency value compared to the other API's?
Does this value have any impact on my application such as slowing the response or something else? If yes, are there any guidelines to lower this value?
Thank you
Latency is the "delay" until an operation starts. Cloud Functions, in particular, have to actually load and start a container (if they have paused), or at least load from memory (it depends on how often the function is called).
Can this affect your client? Holy heck, yes. but what you can do about it is a significant study in and of itself. For Cloud Functions, the biggest latency comes from starting the "container" (assuming cold-start, which your low Request count suggests) - it will have to load and initialize modules before calling your code. Same issue applies here as for browser code: tight code, minimal module loads, etc.
Some latency is to be expected from Cloud Functions (I'm pretty sure a couple hundred ms is typical). Design your client UX accordingly. Cloud Functions real power isn't instantaneous response; rather it's the compute power available IN PARALLEL with browser operations, and the ability to spin up multiple instances to respond to multiple browser sessions. Use it accordingly.
Listen and Write are long lived streams. In this case a 8 minute latency should be interpreted as a connection that was open for 8 minutes. Individual queries or write operations on those streams will be faster (milliseconds).

Can you programmatically detect re-execution in Google Cloud Dataflow?

In Google Cloud Dataflow (streaming pipeline), your data "bundles" can be re-executed because of failure or speculative execution. Is there any way of knowing that the current bundle/element is a re-execution?
This would be very useful to provide conditional behavior for side-effects (in our case: to help make a datastore update operation (read/write) idempotent).
I don't believe this is something that is offered through the Beam API but you can avoid the need to know this information through following mechanisms.
If writes to the external datastore are idempotent, simply introduce a fusion break by adding a Reshuffle transform before the write step. This will make sure that data to be written is not re-generated when there are failures.
If writes to external datastore are not idempotent (for example, files, BigQuery), usual mechanism is to combine (1) with writing to a temporary location first. When all (parallel) writes to temporary location are finished, results can be finalized in an idempotent and a failure safe way from a single worker.
Many Beam sinks utilize these mechanisms to write to external data stores in an idempotent manner. For streaming, usually, these operations are performed per window.

Meteor server-side memory usage for thousands of concurrent users

Based on this answer, it looks like the meteor server keeps an in-memory copy of the cache for each connected client. My understanding is that it gets used in order to avoid sending multiple copies of data when dealing with overlapping subscriptions on a client.
The relevant part of the linked answer (emphasis is mine):
The merge box: The job of the merge box is to combine the results (added, changed and removed calls) of all of a client's active publish functions into a single data stream. There is one merge box for each connected client. It holds a complete copy of the client's minimongo cache.
Assuming that answer is still accurate in the current version of meteor, couldn't that create a huge waste of memory on the server as the number of users increases?
As an off-the-cuff calculation, if an app had about a 100kB cache per client, then 10,000 concurrent users would use up 1GB of memory on the server, and 100,000 users a whopping 10GB! This would be true even if each client was looking at almost identical data. It seems plausible for an app use much more data than that per client, which would further exacerbate the problem.
Does this problem exist in the current version of Meteor? If so, what techniques can be used to limit the amount of memory the server needs to use to manage all the client subscriptions?
Take a look at this post by Arunoda at his meteorhacks.com blog:
http://meteorhacks.com/making-meteor-500-faster-with-smart-collections.html
which talks about his Smart Collections page:
http://meteorhacks.com/introducing-smart-collections.html
He created an alternative Collection stack which has succeeded in it's goals for speed, efficiency (memory & cpu) and scalability (you can see a graphed comparison in the post). Admittedly in his tests RAM usage was negligent with both Collection types, although the way he's implemented things there should be a very obvious difference with the type of use case you mentioned.
Also, you can see in this post on meteor-core:
https://groups.google.com/d/msg/meteor-core/jG1KLObX1bM/39aP4kxqWZUJ
that the Meteor developers are aware of his work and are cooperating in implementing some of the improvements into Meteor itself (but until then his smart package works great).
Important note! Smart collections relies on access to the Mongo Oplog. This is easy if you're running on your own machine or hosted infrastructure. If you're using a cloud based database, this option might not be available, or if it is, will cost a lot more than the smaller packages.

Resources