what is the best practice for handling asynchronous api call that take time - asynchronous

So suppose I have an API to create a cloud instance asynchronously. So after I made an API call it will just return the success response, but the cloud instance will not been initialized yet. It will take 1-2 minutes to create cloud instance and after that it will save the cloud instance information (ex. ip, hostname, os) to db which mean I have to wait 1-2 minutes so I can fetch the data again to show cloud information. At first I try making a loading component, but the problem is that I don't know when the cloud instance is initialized (each instance has different time duration for creating). I'm considering using websocket or using cron or should I redesign my API? Has anyone design asynchronous system before how do you handle such a case.

If the API that you call gives you no information on when it's done with its asynchronous processing, it seems to me that you'll have to check at intervals until you find that the resource is ready; i.e. to poll it.
This seems to me to roughly fit the description and intent of the Polling Consumer pattern. In general, for asynchronous systems design, I can't recommend Enterprise Integration Patterns enough.

As other noted you can either have a notification channel using WebSockets or poll the backend. Personally I'd probably go with the latter for this case and would actually create several APIs, one for initiating the work and get back a URL with "job id" in it where the status of the job can be polled.
RESTfully that would look something like POST /instances to initiate a job GET /instances see all the instances that are running/created/stopped and GET /instances/<id> to see the status of a current instance (initiating , failed , running or whatever)

WebSockets would work, but might be an overkill for this use case. I would probably display a status of 'creating' or something similar after receiving the success response from the API call, and then start polling the API to see if the creation process has finished.

Related

Cloud Tasks - waiting for a result

My application needs front-end searching. It searches an external API, for which I'm limited to a few calls per second.
So, I wanted to keep ALL queries, related to this external API, on the same Cloud Task queue, so I could guarantee the amount of calls per second.
That means the user would have to wait for second or two, most likely, when searching.
However, using Google's const { CloudTasksClient } = require('#google-cloud/tasks') library, I can create a task but when I go to check it's status using .getTask() it says:
The task no longer exists, though a task with this name existed recently.
Is there any way to poll a task until it's complete and retrieve response data? Or any other recommended methods for this? Thanks in advance.
No. GCP Cloud Tasks provides no way to gather information on the body of requests that successfully completed.
(Which is a shame, because it seems quite natural. I just wrote an email to my own GCP account rep asking about this possible feature. If I get an update I'll put it here.)

IIS request with quick response but continue to process

I'm working on an API (Pragmatic Rest API or very similar). I would like to know if it is possible to do an API request that will return a quick response (in JSON) and continue to process heavy code in background.
I suppose this is possible by using queue system but I have no idea where to start with this.
You can have your API delegate long running things to another process.
You mentioned queues, that's one way of doing things, all you need really is an application which can execute whatever long running tasks you have.
Let's imagine a simple system that can do this.
Your API receives a request to do something.
Instead of doing this something, the API writes one record into a database with the details of what needs to be done. Another app watches that table, sees a new record, runs the thing, updates the record with the status / result / whatever it needs.
On any requests from now on, the API can check the record and return whatever is there.
This is the simplest thing I can think of. You can easily do other things as well, talk to a queue system, send it data, let something else execute it.
Looking at your comments, what you are suggesting is not really a good way of building APIs. Why do I say this?
Well, let's say that you receive a request, the API starts a work thread and sends back a 200 to the client. Great the client knows work has started and how does it know when that process had ended and how does it receive whatever data it expects back?
Let's go a bit deeper next.
What happens when 1000 clients call that one endpoint and your API is attempting to start 1000 work threads? You've killed your API, no work gets done and no client gets anything.
This is why I suggest to delegate the work to something else, not the API. Let the API do what it does best, run quick things and return results and delegate other things to something else.

HTTP request via RabbitMQ?

I am designing a system, one component of the system gives me approx 50 outputs. I then start up VM instances for each of the 50 outputs, pass the outputs as inputs and run a process which can take 10 - 60 minutes on each of the instances.
Currently, when I get my output data, what I do is add each output to a message queue (rabbitmq) and then send an HTTP request to a cloud function. This cloud function basically creates 'self-destructing' instances for each output. The HTTP request has the "number_of_req_instances" and then each instance acts as a consumer, and picks one task from the queue.
I was wondering, is there any way to send the HTTP request from rabbitmq? Or whats the best practice for handling this sort of use-case? I'm not entirely happy that my 'http-request' to create instances and the population of my queue are two steps.
I not only need to pass the output as input, but I also need to start up the instances. I also like the fact that RabbitMQ works quite well with the acknowledgement of messages, so I'm keen to keep that as part of the system. I could however use HTTP requests to pass all the information and feed it to the metadata of the instances. But that's not ideal since the HTTP response would be direct and I wouldn't know if any of the tasks failed as opposed to using RabbitMQ.
Any suggesstions?
You could look into a solution with Cloud Function being triggered by a Pub/Sub message. The output would be sent to a topic in Pub/Sub. This topic is set as a trigger to launch the Function once a topic is published. The Cloud Function will ingest the Pub/Sub message containing the output and process the output.
You may look more into this documentation for Cloud Function triggered by Pub/Sub. There is also some architecture references you might find interesting. ie The serveless event driven

Why do we need a queue when using webhooks?

Can anyone clarify what is the purpose of using queue ?
What i understand is that a webhook is just a URL , you do a POST request to that URL and then do some stuff based on the body/data of the request. So why i need to queue the data and store it in a database then loop through the database again and perform the stuff.
The short answer is, you don't have to use a queue. A webhook is just an HTTP request (typically POST) notifying your application of some type of event. The reason you might want to consider a queue is because of typical issues you could run into.
One of these is because of response time back to the webhook requester (source). Many sources want a response (HTTP status 200) as quickly as possible so they can dequeue the request from their webhook system. If processing the webhook takes some time, a source will typically advise you to use a queue to defer the lengthier process asynchronous to the 200 response to the webhook.
Another possible reason could be for removing duplicate requests. There is no guarantee with webhooks that you will only receive a single request per event. A queue can be used to de-dupe these requests.
I would recommend you stick with a simple request handler if possible, then evolve a more sophisticated handler if you run into issues. Consider queues as a potential design approach if you run into issues like those above.
You need some way to prevent a conflict if the webhook is invoked multiple times very close together.
It doesn't necessarily have to be a queue, though. If the webhook performs database queries and updates, you can use a transaction to ensure that this is atomic for each invocation.
In this respect, it's little different from any other web utility. You should do something similar in scripts that process web forms.

Handling changes in user Google Tasks using GTasks API

We are building service that will synchronize with user Google Tasks data, so if user add/edit/delete task in GTask, so it will be added/edited/deleted in our service.
And there is a big problem with synchronization: as I see GTasks API does not provide any onUpdate/onChange event listeners. I mean, the perfect solution can be if there will be Google Tasks API method that can be used to set some callback URL that will be requested when user add/edit/delete tasks.
But I can't find such method in Google Tasks API, so now there is only one very bad way to sync with Google Tasks API - request all users tasks and compare them with service tasks. This is very bad way to sync, because if we have 10k users and want their tasks list be synchronizaed up to 1 minute, so we will need to make > 10k GTasks API requests per minute :(
I hope that I'm wrong and there is some way to set onChange/onUpdate callback for user tasks. Or may be there is some another way to receive actual notification of user GTasks changes(by email & etc).
Does anybody know it?
Thank you.
You could use updatedMin parameter to only get Tasks that have been updated since a given timestamp, as described in the documentation.
You should be able to rely on ETag and If-None-Match headers when querying user tasks lists to get a 304 Not Modified if the no tasks in the list have changed. (Not that should also works when polling individual tasks)
This way you can effectively poll for the tasks that have changed since the last time you synced.

Resources