Idempotency in Google Cloud Function HTTPS triggers - firebase

Most firebase cloud function trigger function signatures include a context object which has an eventId property.
Looking at the documentation, this doesn't seem to be the case for HTTPS-triggers.
Is it safe to assume that calls to HTTP functions will only trigger once per request?

Jack's answer is mostly correct, but I'll clarify here.
The documentation on execution semantics is here. To be clear:
HTTP functions are invoked at most once. This is because of the
synchronous nature of HTTP calls, and it means that any error on
handling function invocation will be returned without retrying. The
caller of an HTTP function is expected to handle the errors and retry
if needed.
There is no guarantee that an HTTP function is executed exactly once. Some executions may fail before they reach the function. This is different from all other (background) types of function that provider at least once execution:
Background functions are invoked at least once. This is because of
asynchronous nature of handling events, in which there is no caller
that waits for the response and could retry on error. An emitted event
invokes the function with potential retries on failure (if requested
on function deployment) and sporadic duplicate invocations for other
reasons (even if retries on failure were not requested).
So, for background functions to be 100% correct, they should be idempotent.
If you want to retry failed HTTP functions, the client will have to perform the retry, and in that case, you may want that HTTP function to be idempotent as well. The client will have to provide the unique key on retry, in that case.
Note that it's not possible to mark an HTTP function for internal retries. That's only possible for background functions.

HTTPS functions will only trigger once compared to background functions that have a at least once delivery guarantee.
(I cant find the docs where I read it. If I find it i will update the question)

Related

Grpc C++: How to wait until a unary request has been sent?

I'm writing a wrapper around gRPC unary calls, but I'm having an issue: let's say I have a ClientAsyncResponseReader object which is created and starts a request like so
response_reader_ = std::unique_ptr<grpc::ClientAsyncResponseReader<ResponseType>>(
grpc::internal::ClientAsyncResponseReaderFactory<ResponseType>::Create(
channel.get(), completion_queue, rpc_method, &client_context_, request, true
)
);
response_reader_->Finish(
response_sharedptr_.get(), status_sharedptr_.get(), static_cast<void*>(some_tag)
);
// Set a breakpoint here
where all of the arguments are valid.
I was under the impression that when the Finish call returned, the request object was guaranteed to have been sent out over the wire. However by setting a breakpoint after that Finish() call (in the client program, to be clear) and inspecting my server's logs, I've discovered that the server does not log the request until after I resume from the breakpoint.
This would seem to indicate that there's something else I need to wait on in order to ensure that the request is really sent out: and moreover, that the thread executing the code above still has some sort of role in sending out the request which appears post-breakpoint.
Of course, perhaps my assumptions are wrong and the server isn't logging the request as soon as it comes in. If not though, then clearly I don't understand gRPC's semantics as well as I should, so I was hoping for some more experienced insight.
You can see the code for my unary call abstraction here. It should be sufficient, but if anything else is required I'm happy to provide it.
EDIT: The plot thickens. After setting a breakpoint on the server's handler for the incoming requests, it looks like the call to Finish generally does "ensure" that the request has been sent out: except for the first request sent by the process. I guess that there is some state maintained either in grpc::channel or maybe even in grpc::completion_queue which is delaying the initial request
From the documentation
response_reader_ = std::unique_ptr<grpc::ClientAsyncResponseReader<ResponseType>>(
grpc::internal::ClientAsyncResponseReaderFactory<ResponseType>::Create(
channel.get(), completion_queue, rpc_method, &client_context_, request, true
)
);
This will start a call and write the request out (start=true). This function does not have a tag parameter. So there is no way the completion queue can notify when the call start is finished. Calling an RPC method is a bit complicated, it basically involves creating the network packet and putting it in the wire. It can fail if there is a transient failure of the transport or the channel completely gone or the user did something stupid. Another thing, why we need the tag notification is that the completion queue is really a contention point. All RPC objects talk to this, it can happen completion queue is not free and the request is still pending.
response_reader_->Finish(
response_sharedptr_.get(), status_sharedptr_.get(), static_cast<void*>(some_tag)
This one will request the RPC runtime to receive the server's response. The output is when the server response arrives, then the completion queue will notify the client. At this point. we assume that there is no error on the client side, everything okay and the request is already in flight. So the status of Finish call will never be false for unary rpc.
This would seem to indicate that there's something else I need to wait on in order to ensure that the request is really sent out: and moreover, that the thread executing the code above still has some sort of role in sending out the request which appears post-breakpoint.
Perhaps, you want to reuse the request object(I did some experiments on that). For me, I keep the request object in memory till the response arrives. There is no way to guarantee that the request object won't be required after the create call.

Trace request as it propagates through Google Cloud Functions

Say you have an HTTP endpoint which, when triggered, publishes a PubSub message and then sends a response.
There is another Cloud Functions which is subscribed to this event, performs what it needs to perform, and then ends.
How would you go about to trace the entire sequence of function executions triggered by an initial request (in this example, the first HTTP request)?
I see in the Google Cloud Platform logs there is a function Execution ID, but this changes with each function that is triggered hence it's hard to follow the sequence of executions. Is there an automated way of doing this? Or does it need custom implementation?
Thanks!
You will need a custom solution. If you want to trace this all the way back to the client request, you will need to generate some unique ID on the client, and pass that along to the HTTP function, which would then pass that along to the pubsub function via the message payload. And so on.
You might find it helpful to use StackDriver logging to collect the logs around that unique ID.

Multiple Firestore changes with batch vs cloud functions

In a chat app, if I add a new message to the messages collection, I also need to update that particular chat's document in another collection to show the last message and the time when it was sent. Right now I am triggering a cloud function every time a new message comes, in order to update the metadata for the chat. Am I doing the right thing or would it be more appropriate to use Batched writes instead?
There is a difference that you might be aware of when using one approach vs. the other. When using a batch write, according to the official documentation:
You can execute multiple write operations as a single batch that contains any combination of set(), update(), or delete() operations. A batch of writes completes atomically and can write to multiple documents.
This means that those simultaneous updates that are made in this atomic way, either all updates succeed or all updates fail.
In case you are using a function that is triggered once a message is sent, it means that you are performing two separate actions. The first one is to send a message and the second one is to update some metadata once the message is successfully sent. In this case, you can send a message but your function may fail, according to the official documentation:
By default, without retries enabled, the semantics of executing a background function are "best effort." This means that while the goal is to execute the function exactly once, this is not guaranteed.
This are the reasons why background functions fail to complete:
On rare occasions, a function might exit prematurely due to an internal error, and by default the function might or might not be automatically retried.
More typically, a background function may fail to successfully complete due to errors thrown in the function code itself. Some of the reasons this might happen are as follows:
The function contains a bug and the runtime throws an exception.
The function cannot reach a service endpoint, or times out while trying to reach the endpoint.
The function intentionally throws an exception (for example, when a parameter fails validation).
When functions written in Node.js return a rejected promise or pass a non-null value to a callback.
The workaround in this case, is to use retry to handle transient errors.

firebase HTTP function termination

Is it OK practice to put additional logic into a Firebase HTTPS function, after the response was sent?
I have functions where this is happening:
write to the Firebase DB
once the write is done, I send back the response (this is where res.status(200 / 500).send() is
called)
I look up some FCM tokens in the DB and send a push message (it does not matter from a requester perspective if this is successful or not)
I understand that another pattern could be that I move step 3 to another DB trigger function to do the messaging. That would introduce some delay as I'd need to wait for that DB trigger function to fire.
My question is: is it safe to put additional logic to a HTTPS function after the
response is sent, or Firebase may start to cleanup / terminate my function already?
firebaser here
While your sending of FCM messages (in step 3) may frequently work, it is not reliable. There is no guarantee that the HTTP-triggered function will continue running after a response has been sent.
Precisely for this reason the Firebase documentation says:
HTTP functions are synchronous, so you should send a response as quickly as possible and defer work using Cloud Firestore.
So in your case, the documentation explicitly says to put the sending of the notification into a database-triggered function.

Web API 2 - are all REST requests asynchronous?

Do I need to do anything to make all requests asynchronous or are they automatically handled that way?
I ran some tests and it appears that each request comes in on its own thread, but I figure better to ask as I might have tested wrong.
Update: (I have a bad habit of not explaining fully - sorry) Here's my concern. A client browser makes a REST request to my server of http://data.domain/com/employee_database/?query=state:Colorado. That comes in to the appropriate method in the controller. That method queries the database and returns an object which is then turned into a JSON structure and returned to the calling app.
Now let's say 10,000 clients all make a similar query to the same server. So I have 10,000 requests coming in at once. Will my controller method be called simultaneously in 10,000 distinct threads? Or must the first request return before the second request is called?
I'm not asking about the code in my handler method having asynchronous components. For my case the request becomes a single SQL query so the code has nothing that can be handled asynchronously. And until I get the requested data, I can't return from the method.
No REST is not async by default. the request are handled synchronously. However, your web server (IIS) has a number of max threads setting which can work at the same time, and it maintains a queue of the request received. So, the request goes in the queue and if a thread is available it gets executed else, the request waits in the IIS queue till a thread is available
I think you should be using async IO/operations such as database calls in your case. Yes in Web Api, every request has its own thread, but threads can run out if there are many consecutive requests. Also threads use memory so if your api gets hit by too many request it may put pressure on your system.
The benefit of using async over sync is that you use your system resources wisely. Instead of blocking the thread while it is waiting for the database call to complete in sync implementation, the async will free the thread to handle more requests or assign it what ever process needs a thread. Once IO (database) call completes, another thread will take it from there and continue with the implementation. Async will also make your api run faster if your IO operations take longer to complete.
To be honest, your question is not very clear. If you are making an HTTP GET using HttpClient, say the GetAsync method, request is fired and you can do whatever you want in your thread until the time you get the response back. So, this request is asynchronous. If you are asking about the server side, which handles this request (assuming it is ASP.NET Web API), then asynchronous or not is up to how you implemented your web API. If your action method, does three things, say 1, 2, and 3 one after the other synchronously in blocking mode, the same thread is going to the service the request. On the other hand, say #2 above is a call to a web service and it is an HTTP call. Now, if you use HttpClient and you make an asynchronous call, you can get into a situation where one request is serviced by more than one thread. For that to happen, you should have made the HTTP call from your action method asynchronously and used async keyword. In that case, when you call await inside the action method, your action method execution returns and the thread servicing your request is free to service some other request and ultimately when the response is available, the same or some other thread will continue from where it was left off previously. Long boring answer, perhaps but difficult to explain just through words by typing, I guess. Hope you get some clarity.
UPDATE:
Your action method will execute in parallel in 10,000 threads (ideally). Why I'm saying ideally is because a CLR thread pool having 10,000 threads is not typical and probably impractical as well. There are physical limits as well as limits imposed by the framework as well but I guess the answer to your question is that the requests will be serviced in parallel. The correct term here will be 'parallel' but not 'async'.
Whether it is sync or async is your choice. You choose by the way to write your action. If you return a Task, and also use async IO under the hood, it is async. In other cases it is synchronous.
Don't feel tempted to slap async on your action and use Task.Run. That is async-over-sync (a known anti-pattern). It must be truly async all the way down to the OS kernel.
No framework can make sync IO automatically async, so it cannot happen under the hood. Async IO is callback-based which is a severe change in programming model.
This does not answer what you should do of course. That would be a new question.

Resources