AWS SNS for time taken API calls - amazon-sns

Hi We subscribe the AWS SNS with our API service for guarantee execution and retry mechanism ,unfortunately our API call takes more than 30 sec to complete the task , as SNS waits for response less then 30 sec it treats as fail and reties the API again even my first API call is success after 30 sec, is there any way to increase the SNS response time like wait response to 2 or 3 mins or stop the retry of SNS dynamically or please suggest some other mechanism to run this background jobs with retry policy.

For this type of use-case, you might want to consider publishing to a SQS queue from your SNS topic and then have your application poll from the queue to find jobs to execute. As SNS won't be calling your service directly, there is no timeout and you're free to take as much time as needed to complete the job.

Related

Firebase Cloud Messaging: "Topic Quota Exceeded"

I have a webapp and a Windows Service which communicate using Firebase Cloud Messaging. The webapp subscribes to a couple of Topics to receive messages, and Windows Service App sends messages to one of these Topics. In some cases it can be several messages per seconds, and it gives me this error:
FirebaseAdmin.Messaging.FirebaseMessagingException: Topic quota exceeded
I don't quite get it. Is there a limit to messages that can be sent to a specific topic, or what is the meaning?
I have found until now only info about topic names and subscription limits, but I actually couldn't find anything about "topic quota", except maybe this page of the docs (https://firebase.google.com/docs/cloud-messaging/concept-options#fanout_throttling) although I am not sure it refers to the same thing, and in case if and how it can be changed. In the Firebase Console I can't find anything either. Has anybody got an idea?
Well.. from this document it seems pretty clear that this can happen:
The frequency of new subscriptions is rate-limited per project. If you
send too many subscription requests in a short period of time, FCM
servers will respond with a 429 RESOURCE_EXHAUSTED ("quota exceeded")
response. Retry with exponential backoff.
I do agree that the document should've state how much quantity will trigger the block mechanism instead of just telling the developer to "Retry with exponential backoff". But, at the end of the day, Google also produced this document to help developers understand how to properly implement this mechanism. In a nutshell:
If the request fails, wait 1 + random_number_milliseconds seconds and
retry the request.
If the request fails, wait 2 + random_number_milliseconds seconds and
retry the request.
If the request fails, wait 4 + random_number_milliseconds seconds and
retry the request.
And so on, up to a maximum_backoff time.
My conclusion: reduce the amount of messages send to topic OR implement a retry mechanism to recover unsuccessful attempts
It could be one of these issue :
1. Too high subscriptions rates
Like noted here
The frequency of new subscriptions is rate-limited per project. If you send too many subscription requests in a short period of time, FCM servers will respond with a 429 RESOURCE_EXHAUSTED ("quota exceeded") response. Retry with exponential backoff.
But this don't seem to be your problem as you don't open new subscriptions, but instead send messages at high rate.
2. Too many messages sent to on device
Like noted here
Maximum message rate to a single device
For Android, you can send up to 240 messages/minute and 5,000 messages/hour to a single device. This high threshold is meant to allow for short term bursts of traffic, such as when users are interacting rapidly over chat. This limit prevents errors in sending logic from inadvertently draining the battery on a device.
For iOS, we return an error when the rate exceeds APNs limits.
Caution: Do not routinely send messages near this maximum rate. This
could waste end users’ resources, and your app may be marked as
abusive.
Final notes
Fanout throttling don't seems to be the issue here, as the rate limit is really high.
Best way to fix your issue would be :
Lower your rates, control the number of "devices" notified and overall limit your usage over short period of time
Keep you rates as is but implement a back-off retries policy in your Windows Service App
Maybe look into a service mor suited for your usage (as FCM is strongly focused on end-client notification) like PubSub

Jenkins job to store and send http POST requests

For my incoming write traffic via HTTP POST API, I need to maintain the order of writes. For this, I need to create a Jenkins Job that gets triggered via remote API call on each request.
My question is, does Jenkins have a readily-available plugin to do this? Is there any such thing as a Queue in Jenkins that stores API requests and keeps triggering a job ( which will create the relevant request and send it ) while the queue is not empty?
In case of failure, I need to retry and hold the remaining requests.
When you trigger Jobs(Remote or manually) the Jobs will be queued up. Unless you have configured parallel execution with multiple Job executors, the Job queue will be processed in order and within each Job you can specify what you want to do on failure.
Having said that there is no way in Jenkins to just store HTTP requests.(At least OOB) and trigger Jobs based on that.

what is the best practice for handling asynchronous api call that take time

So suppose I have an API to create a cloud instance asynchronously. So after I made an API call it will just return the success response, but the cloud instance will not been initialized yet. It will take 1-2 minutes to create cloud instance and after that it will save the cloud instance information (ex. ip, hostname, os) to db which mean I have to wait 1-2 minutes so I can fetch the data again to show cloud information. At first I try making a loading component, but the problem is that I don't know when the cloud instance is initialized (each instance has different time duration for creating). I'm considering using websocket or using cron or should I redesign my API? Has anyone design asynchronous system before how do you handle such a case.
If the API that you call gives you no information on when it's done with its asynchronous processing, it seems to me that you'll have to check at intervals until you find that the resource is ready; i.e. to poll it.
This seems to me to roughly fit the description and intent of the Polling Consumer pattern. In general, for asynchronous systems design, I can't recommend Enterprise Integration Patterns enough.
As other noted you can either have a notification channel using WebSockets or poll the backend. Personally I'd probably go with the latter for this case and would actually create several APIs, one for initiating the work and get back a URL with "job id" in it where the status of the job can be polled.
RESTfully that would look something like POST /instances to initiate a job GET /instances see all the instances that are running/created/stopped and GET /instances/<id> to see the status of a current instance (initiating , failed , running or whatever)
WebSockets would work, but might be an overkill for this use case. I would probably display a status of 'creating' or something similar after receiving the success response from the API call, and then start polling the API to see if the creation process has finished.

HTTP request via RabbitMQ?

I am designing a system, one component of the system gives me approx 50 outputs. I then start up VM instances for each of the 50 outputs, pass the outputs as inputs and run a process which can take 10 - 60 minutes on each of the instances.
Currently, when I get my output data, what I do is add each output to a message queue (rabbitmq) and then send an HTTP request to a cloud function. This cloud function basically creates 'self-destructing' instances for each output. The HTTP request has the "number_of_req_instances" and then each instance acts as a consumer, and picks one task from the queue.
I was wondering, is there any way to send the HTTP request from rabbitmq? Or whats the best practice for handling this sort of use-case? I'm not entirely happy that my 'http-request' to create instances and the population of my queue are two steps.
I not only need to pass the output as input, but I also need to start up the instances. I also like the fact that RabbitMQ works quite well with the acknowledgement of messages, so I'm keen to keep that as part of the system. I could however use HTTP requests to pass all the information and feed it to the metadata of the instances. But that's not ideal since the HTTP response would be direct and I wouldn't know if any of the tasks failed as opposed to using RabbitMQ.
Any suggesstions?
You could look into a solution with Cloud Function being triggered by a Pub/Sub message. The output would be sent to a topic in Pub/Sub. This topic is set as a trigger to launch the Function once a topic is published. The Cloud Function will ingest the Pub/Sub message containing the output and process the output.
You may look more into this documentation for Cloud Function triggered by Pub/Sub. There is also some architecture references you might find interesting. ie The serveless event driven

Google Cloud Tasks: some tasks remain in queue instead of being processed

I have a Google Cloud Task queue (rate: 10/s, bucket: 200, concurrent: 1) that dispatches tasks to a worker in a App Engine service (python 2.7 runtime) Tasks are normally added to the queue about 3-4/s. Each task is processed one at a time (no concurrency)
In general, each task is processed very fast(less than 1sg). Surprisingly, the queue sometimes randomly "pauses" a small subset of 5-20 tasks. New incoming tasks are processed as usual but those ones are blocked and stay on the queue for some minutes, even when worker is idle and might process them. After 7-9 minutes, they are processed automatically without any other interaction. Issue is this delay is too much and not acceptable :(
While "paused", I can manually execute those tasks by clicking on the "Run" button and they are immediately processed. So I'd discard some kind of limitation on the worker side.
I tried redeploying the queue.yaml. I also tried pausing and resuming the queue. Both with no effect.
No errors are notified. Tasks are not retried, just ignored for some minutes.
Has anybody experienced this behavior? Any help will be appreciated. Thanks.
Cloud Tasks now uses gcloud (Cloud SDK) to manage the queue configuration. queue.yaml is a part of the legacy App Engine SDK for App Engine Task Queues. Uploading a queue.yaml when using Cloud Tasks may cause your queue to be disabled or paused.
To learn more about queue management see, Using Queue Management versus queue.yaml.
To learn more about migrating from Task Queues to Cloud Tasks, see Migrating from Task Queues to Cloud Tasks

Resources