Getting around SNS API call limits to publish - amazon-sns

I need to publish a unique message to potentially thousands of device endpoints simultaneously.
The message is unique so I cant group the endpoints in to topics...
Although I cant find any documentation, it seems that SNS limits to only 10 concurrent API publish requests.
More than 10 concurrent requests returns
RequestError: send request failed
caused by: Post https://sns.us-east-1.amazonaws.com/: dial tcp 54.239.24.226:443: i/o timeout
And then seems to block my IP from further requests for a short time...
I was planing to have the whole app backend to be "serverless" which would mean that there would be a scheduled task in Lambda to make the calls to SNS publish...
1000 SNS publish requests / 10 concurrent = 100 batches...This would mean it would take 100 * x seconds to process all the messages which would reach the API gateway and Lambda timeout limits (and would also add to the costs)
Is there a good way around these limits. A increase in allowed concurrent API calls would be nice...

Amazon SNS does not enforce a rate limit for Publish calls. Occasionally, SNS will throttle requests but the service will respond with HTTP 400 and an AWS SNS request ID.
The error message you posted looks like something upstream between you and the SNS endpoint is rate limiting your calls. Check if there is a proxy or firewall between you and the SNS endpoint, or talk to your network administrator.

You can request additional limit increases here:
https://console.aws.amazon.com/support/cases#/create?issueType=service-limit-increase&limitType=service-code-sns

I encountered something like this before and the solution was launching tens of t2.micro or t2.nano instances. Because there is also a limit about the requests that you can make from ec2 to aws.

AWS SNS quotas for publishing a messages ranges between 30K transactions/sec to 300 based on if its fifo and which region. If it's fifo there is a 10mb/sec limit so that maybe what is limiting the publishes.
https://docs.aws.amazon.com/general/latest/gr/sns.html

Related

How is a gRPC queue managed? is there a size limitation for a gRPC queue?

I am trying to understand how gRPC queues are managed and if there are any size limitations on gRPC queue size.
According to this SO post requests are queued:
If your server already processing maximum_concurrent_rpcs number of requests concurrently, and yet another request is received, the request will be rejected immediately.
If the ThreadPoolExecutor's max_workers is less than maximum_concurrent_rpcs then after all the threads get busy processing requests, the next request will be queued and will be processed when a thread finishes its processing.
According to this GitHub post the queue is managed by the gRPC server:
So maximum_concurrent_rpcs gives you a way to set an upper bound on the number of RPCs waiting in the server's queue to be serviced by a thread.
But this Microsoft post cofused me, saying requests are queued on the client:
When the number of active calls reaches the connection stream limit, additional calls are queued in the client. Queued calls wait for active calls to complete before they are sent.
Pay attention though, that here Microsoft is talking about connection stream limit. When that limit is reached, a queue is formed on the client.
Are there 2 types of queues? One that is created on the server (gRPC queue) when some limits are met (as mentioned above), and another created on the client when this connection stream limit is reached.
And what is the size limit of a gRPC queue? I mean, it is limited only by the underlying hardware (RAM)?
Is there any chance we can get the server to fail because of a huge queue size? Is it possible to limit this queue size?
And if we are talking about 2 different queues, can we manage and limit the one on the client too?
I am especially interested in python's point of view.
Thanks!
P.S. I am assuming when people are talking about gRPC queues they are talking about a queue created on the server.

How do retries work in a Datapower mpgw service using routing-url to set backside URL?

I have a datapower mpgw service that takes in JSON POST and GET HTTPs requests. Persistent connections are enabled. It sets the backend url using the dp routing-url variable. How do retries work for this? is there some specific retry setting? does it do retries automatically up to a certain point? what if I don't want it to retry?
The backend app is taking about 1.5 minutes to return 500 when it can't connect, but I want it to return more quickly. I have the "backside timeout" set to 30 seconds. I'm wondering if it's because it's retrying a couple times but I can't find info on how retries are working or configured in this case.
I'm open to more answers, but what i found here looks like it says that with persistent connections enabled, DP will retry after the backend timeout duration up until the duration of the persistent connection timeout.

Amazon SNS rate limiting on HTTPS endpoint

I have a issue with the rate that Amazon SNS calls our HTTPS endpoint. Our server can't handle that much calls at once and crashes eventually.
The situation
We are sending newsletters with Amazon SES (Simple Email Service). The notifications about bounces / complaints / delivery are send to a SNS topic (all the same).
We are sending the newsletter with a rate of 2000 e-mail per minute. This also means that we receive the SNS topics with a rate of 2000 per minute. The sending of newsletters and receiving of the SNS topics are all handled by the same server.
The server is already busy by sending those newsletters, and in the meantime it must also handle the SNS topics and that is to much.
So I actually want to limit the rate of the SNS topics, so that they are send and a rate of eg 500 per minute. I can't find something like that in the policy.
Can you create an SQS queue and subscribe it to the SNS topic? Then, your service can process messages from queue later/when it is possible.

Apache Async http client performance vs sync client

I am trying to switch my application to the async version from apache http-components client . The goal is to be able to handle more outbound connections (in the near future).
The payload of the requests is quite small (<5KB)
The endpoints I hit are around 20 in number.
With sync version of apache http client, the through put is about 200 requests/sec.
The average response time is about 100ms/request.
I abort the requests after a max of 180ms.
After switching to Async, the response time went up by 20ms/request.
The throughput also reduced to 160/sec. The number of aborted requests doubled.
This is after fine tuning the application a lot.
Is there anything I can do to improve the performance of async client?
I set maxConnectionsPerRoute high. Have a large Connection pool.
Are there any params that are key to getting the most out of async client?
Did you forget to set maxConnTotal?
The default maxConnTotal is 20, this is a global limit.
I forgot to set it once.

Response not received back to client from Apigee Cloud

POSTMANCleint--> Apigee OnCloud-->Apigee On Premise---->Backend
Backend is taking 67 sec to respond and i can see the response in Apigee cloud as well however the same response is not sent to client and instead timeout is received .
I have also increased the timeout counts on HTTTargetConnectionProperties but still the issue persists.
Please let us know where to investigate.
There are two levels of timeout in Apigee -- first at the load balancer which has a 60 second timeout, then at the Apigee layer which I believe was 30 seconds but looks like it was increased to 60.
My guess is that the timeout response is coming from the load balancer and that the the timing is just such that Apigee is able to get the response but the load balancer has already dropped the connection.
If this is a paid instance you should be able to get Apigee to adjust the timeouts to make this work (but, man... 67,000ms response times are pretty long...)

Resources