Lambda as a leaky state machine? - firebase

I have a micro-service which involved in an OAuth 1 interaction. I'm finding myself in a situation where two runs of the Lambda functions with precisely the same starting states have very different outcomes (where state is considered the "event" passed in, environment variables, and "stageParameters" from the API Gateway).
Here's a Cloudwatch log that shows two back-to-back runs:
You can see that while the starting state is identical, the execution path changes pretty quickly. In the second case (failure case), you see the log entry "Auth state changed: null" ... that is very odd indeed because in fact this is logged before even the first line of code of the "handler" is executed. Here's the beginning of the functions handler:
export const handler = (event, context, cb) => {
console.log('EVENT:\n', JSON.stringify(event, null, 2));
So where is this premature logging entry coming from? Well, one must assume that it somehow is left over from prior executions. Let me demonstrate ... it is in fact an event listener that was setup in the prior execution. This function interacts with a Firebase DB and the first time it connects it sets the following up:
auth.signInWithEmailAndPassword(username, password)
.then((result) => {
auth.onAuthStateChanged(this.watchAuthState);
where the watchAuthState function is simply:
watchAuthState(user) {
console.log(`Auth state changed:\n`, JSON.stringify(user, null, 2));
}
This seems to mean that when I run the DB a second time I am already "initialized" with the Firebase DB but apparently the authentication has been invalidated. My number one aim is to just get back to a predictive state model and have it execute precisely the same each time.
If, there are sneaky ways to reuse cached state between Lambda executions in resource useful ways then I guess that too would be interesting but only if we can do that while achieving the predictive state machine.

Regarding the logs order, look at the ID that comes after each timestamp at the beginning of each line. I believe this is the invocation ID. In the two lines you have highlighted in orange, they are from different invocations of the function. The EVENT log is the first line to get logged from the invocation with ID ending in 754ee. The Auth state changed: null line is a log entry coming from the earlier invocation of the function with invocation ID ending in c40d5.
It looks like you are setting auth state to null at the end of an invocation, but the Firebase connection is global, so the second function invocation thinks the Firebase connection is already initialized, but then it throws errors because the authentication was nulled out.
My number one aim is to just get back to a predictive state model and
have it execute precisely the same each time.
Then you need to be aware of Lambda container reuse, and not use any global variables.

Related

Cloud functions are getting called two times, first time with body and second time without

My index.ts has:
exports.foo = functions.https.onCall(async (data, context) => {
console.log('Hello World');
return null;
});
To deploy, I run:
firebase deploy --only functions:foo
To test, I do:
final callable = FirebaseFunctions.instance.httpsCallable('foo');
await callable.call();
First time when the function execution started, my function body runs, but the second time (don't know how it gets invoked), my function body doesn't run. Is this a standard behavior, am I also getting charged for the automatic second invocation?
NOTE: I've read several posts like this, this, this, this etc before asking this question but none of them seemed to work for me.
I've seen and more-or-less logged this; for example, recently some "minimum instances=1" functions seem to start-up and run a few times a day, but the function itself isn't invoked. I also see this at deploy time (I use some custom code that deploys multiple functions at a time).
The way "cold starts" work is they have to run the function files once to FIND and ASSIGN the "functions" within. This part used to be run silently. It would be nifty if Google either returned to NOT logging this, or differentiated it in the logs.
I don't know flutter, but you run it "as if you were in a browser". In addition, pressing a button usually submit something (I mean, it's not a GET request, most of the time).
So, the combination of both PLUS your issue lead me to think to a preflight request. Check the HTTP verb before performing the processing.

Scheduled Firebase/Cloud Functions Overlapping Problem

I have a scheduled function that runs every three minutes.
It is supposed to look on the database (firestore), query the relevant users, send them emails or perform other db actions.
Once it sends an email to a user, it updates the user with a field 'sent_to_today:true'.
If sent_to_today == true, the function won't touch that user for around 24 hours, which is what's intended.
But, because I have many users, and the function is doing a lot of work, by the time it updates the user with sent_to_today:true, another invocation gets to that user beforehand and processes them for sending emails.
This results in some users getting the same email, twice.
What are my options to make sure this doesn't happen?
Data Model (simplified):
users (Collection)
--- userId (document)
--- sent_to_today [Boolean]
--- NextUpdateTime [String representing a Timestamp in ISO String]
When the function runs, if ("Now" >= NextUpdateTime) && (sent_to_today==false), the user is processed, otherwise, they're skipped.
How do I make sure that the user is only processed by one invocation per day, and not many?
As I said, by the time they're processed by one function invocation (which sets "sent_to_today" to true), the next invocation gets to that user and processes them.
Any help in structuring the data better or using any other logical method would be greatly appreciated.
Here is an idea I'm considering:
Each invocation sets a global document's field, "ex: busy_right_now : true" at the start, and when finished it sets it again to false. If a subsequent invocation runs before the current is finished, it does nothing if busy_right_now is still true.
Option 1.
Do you think you the function can be invoked once in ten minutes, rather every three minutes? If yes - just modify the scheduler, and make sure that 'max instances' attribute is '1'. As the function timeout is only 540 seconds, 10 minutes (600 seconds) is more than enough to avoid overlapping.
Option 2.
When a firestore document is chosen for processing, the cloud function modifies some attribute - i.e. __state - and sets its value to IN_PROGRESS for example. When the processing is finished (email is sent), that attribute value is modified again - to DONE for example. Thus, if the function picks up a document, which has the value IN_PROGRESS in the __state attribute - it simply ignores and continues to the next one.
The drawback - if the function crashes - there might be documents with IN_PROGRESS state, and there should be some mechanism to monitor and resolve such cases.
Option 3.
One cloud function runs through the firestore collection, and for each document, which is to be processed - sends a pubsub message which triggers another cloud function. That one works only with one firestore document. Nevertheless the 'state machine' control is required (like in the Option 2 above). The benefit of the option 3 - higher level of specialisation between functions, and there may be many 'second' cloud functions running in parallel.

Firebase Timestamp in cloud function not displaying time

I am using this to get the timestamp
admin.database.ServerValue.TIMESTAMP
But in log getting this while doing console the variable
{ '.sv': 'timestamp' }
anyone help me out in this.
Actually i want to get the timestamp then compare it with db timestamp
The admin.database.ServerValue.TIMESTAMP does not contain the actual server-side timestamp, but is merely a marker (the { '.sv': 'timestamp' } that you see). The database server recognizes this marker on write operations, and then writes the server-side timestamp in its place.
This means that you can't get the server-side timestamp without writing to the database. A simple way to see how to get this is:
let ref = firebase.database().ref("test");
ref.on("value", function(snapshot) {
console.log(snapshot.val());
})
ref.set(admin.database.ServerValue.TIMESTAMP)
When you run this code, your log will show three values:
null
This is the current value in the database when you attach the listener with on("value". Here I'm assuming the test node didn't exist yet, so the value would be null.
1573659849577
This is an estimate that the client makes when you the ref.set(...) statement executes. So the client estimates what it thinks the server timestamp may be, and fires a value event. You can use this to update the UI immediately, so that the user doesn't have to wait.
1573659859162
This is the value that the server actually wrote to the database, and then sent back to the client. So this is the actual server-side timestamp that you're looking for.
In theory the client-side estimate (2) and server-side value (3) may be the same, in which case you wouldn't get the third event. But I've never seen that in practice, as they're always off by at least a couple of milliseconds.

What is the use case of firebase-queue sanitize?

I am experimenting with firebase-queue. I saw the option for sanitizing. It's described in the doc as
sanitize - specifies whether the data object passed to the processing
function is sanitized of internal keys reserved for use by the queue.
Defaults to true.
What does it mean?
I am getting an error for not specifying { sanitize : false }
When the sanitize option is set, the queue sanitizes (or cleans) the input provided to the processing function so that it resembles that which the original client placed onto the queue, and doesn't contain any of the keys added by the implementation of the queue itself.
If, however, you rely on a key (usually the keys starting with an underscore, e.g. _id) that is added by the queue, and not the original client, you need to set sanitize: false so those keys are returned to your function and they're not undefined.
You can clearly see the difference with a simple processing function that just performs a console.log(data).
A quick note about why these keys are removed by default: Reading or writing directly to the location (as it looks like you're perhaps doing, by passing undefined into the client SDK child() method instead of data._id) is generally a bad idea from within the worker itself as writes performed directly are not guarded by the extensive transaction logic in the queue to prevent race conditions. If you can isolate the work to taking input from the provided data field, and returning outputs to the resolve() function, you'll likely have a better time scaling up your queue.

Lua producer-consumer pattern with consumers waiting for different data

The problem
One data source generating data in format {key, value}
Multiple receivers each waiting for different key
Example
Getting data is run in loop. Sometimes I will want to get next value labelled with key by using
Value = MyClass:GetNextValue(Key)
I want my code to stop there until the value is ready (making some sort of future(?) value). I've tried using simple coroutines, but they work only when waiting for any data.
So the question I want to ask is something like How to implement async values in lua using coroutines or similar concept (without threads)?
Side notes
The main processing function will, apart from returning values to waiting consumers, process some of incoming data (say, labeled with special key) itself.
The full usage context should look something like:
-- in loop
ReceiveData()
ProcessSpecialData()
--
-- Called outside the loop:
V = RequestDataWithGivenKey(Key)
How to implement async values
You start by not implementing async values. You implement async functions: you don't get the value back until has been retrieved.
First, your code must be in a Lua coroutine. I'll assume you understand the care and feeding of coroutines. I'll focus on how to implement RequestDataWithGivenKey:
function RequestDataWithGivenKey(key)
local request = FunctionThatStartsAsyncGetting(key)
if(not request:IsComplete()) then
coroutine.yield()
end
--Request is complete. Return the value.
return request:GetReturnedValue()
end
FunctionThatStartsAsyncGetting returns a request back to the function. The request is an object that stores all of the data needs to process the specific request. It represents asking for the value. This should be a C-function that starts the actual async getting.
The request will be either a userdata or an encapsulated Lua table that stores enough information to communicate with the C-code that's doing the async fetching. IsComplete uses the internal request data to see if that request has completed. GetReturnedValue can only be called when IsComplete returns true; it puts the value on the Lua stack, so that this function can return it.
Your external code simply needs to handle the async stuff internally. Between resumes of these Lua coroutines, you'll need to pump whatever async stuff is doing the fetching, if there are outstanding requests.

Resources