Google App Engine Cloud Tasks - 503 Request was aborted after waiting too long to attempt to service your request - google-cloud-tasks

I have a push queue with the following configuration:
Rate limits
-----------------
Max rate - 500/s
Max concurrent - 8
Max burst size - 100
Retry parameters
---------------------------
Max attempts - 3
Min interval - 5s
Max interval - 3600s
Max doublings - 16
Max retry duration - Unlimited
I have a following service performing tasks from the queue:
runtime: python38
service: tasks
instance_class: B2
basic_scaling:
max_instances: 2
idle_timeout: 10m
inbound_services:
- warmup
Problem:
For the test, I set each task execution time ~60s.
Once I put a bunch of tasks into the queue (lets say 1000)
some of them fail with the following error - 503 Request was aborted after waiting too long to attempt to service your request. http request is waiting for about 30s before returning 503 error.
What I expected to happen:
Tasks should wait in the queue until service is ready to perform them, no errors should occur, tasks should not be lost!!!
Other information (workarounds you have tried, documentation consulted, etc):
Also I see 502 errors in service logs.
I've tried to play with Max concurrent queue param, instance_class, scaling types.
If I put smaller bunch of tasks to the queue (~100) no errors occur.
I put tasks to the queue with a simple for loop

Related

What happens when we increase the http.maxTotalConnections for an application?

I have been facing the below exception in my application while calling another service. Upon investigation I was told that the waiting time of the requests (waiting to find an open connection), sitting in threadpool queue is reaching the max wait time and hence they are being rejected. As an action I got to tune in the total number of max connections(http.maxTotalConnections) that can be served by my application concurrently to avoid the long wait time for requests.
Our current http.maxTotalConnections value is set to 100.
Below is the error:
exception=reactor.netty.internal.shaded.reactor.pool.PoolAcquireTimeoutException:
Pool#acquire(Duration) has been pending for more than the configured
timeout of 50ms
I have below questions:
How does one decide the right value for http.maxTotalConnections ?
How do we decide the right balance between the apt total connections vs to a point
where it instead starts affecting the application performance.

elb + auto scaling on aws Nginx + .net core web-api

I am newbie to aws load balancing. Currently, I got 504 error and 502 error from elb when I run load testing. Normally, my api response less than 1 sec but when I run concurrent 2000 threads for 10 to 30 mins response time is steady growing up to 1 min after all my threads are clear. I checked CPU utilization of instances under auto scaling group and all instances CPU are less than 50 percentage. I also checked the monitoring under target group and elb. 5xx error are shown. I don't know why my instances are responding 5xx error to elb. All health checked status are OK. That's concern with Nginx or. Net core runtime?

How can I increase the healthcheck timeout when launching an instance with Boxfuse?

My instance fails to come up within 60 seconds. How can I increase the timeout?
The error I'm getting both locally and on AWS is:
ERROR: Time out: Payload of Instance vb-312f2f77 failed to come up within 60 seconds at http://127.0.0.1:8888/ !
There are two available fixes /causes:
Increase the healthcheck timeout. For example to increase the
default of 60 to 120 seconds you could use
boxfuse fuse payload.war -healthcheck.timeout=120
(More info: https://cloudcaptain.sh/docs/commandline/fuse.html#healthcheck.timeout)
Analyse the instance logs to check whether it is a genuine timeout
or some application startup issue. You can do this by issuing
boxfuse logs vb-312f2f77
(More info: https://cloudcaptain.sh/docs/commandline/logs.html)

How can I debug buffering with HTTP.sys?

I am running Windows 8.1 and I have an integration test suite that leverages HostableWebCore to spin up isolated ASP.NET web server processes. For performance reasons, I am launching 8 of these at a time and once they are started up I send a very simple web request to each, which is handled by an MVC application loaded into each. Every instance is listening on a different port.
The problem is that the requests are getting held up (I believe) in HTTP.sys (or whatever it is called these days). If I look at fiddler, I can see all 8 requests immediately (within a couple milliseconds) hit the ServerGotRequest state. However, the requests sit in this state for 20-100 seconds, depending on how many I run in parallel at a time.
The reason I suspect this is HTTP.sys problem is because the amount of time I have to wait for any of them to respond increases with the number of hosting applications I spin up in parallel. If I only launch a single hosting application, it will start responding in ~20 seconds. If I spin up 2 they will both start responding in ~30 seconds. If I spin up 4, ~40 seconds. If I spin up 8, ~100 seconds (which is default WebClient request timeout).
Because of this long delay, I have enough time to attach a debugger and put a breakpoint in my controller action and that breakpoint will be hit after the 20-100 second delay, suggesting that my process hasn't yet received the request. All of the hosts are sitting idle for those 20-100 seconds after ~5-10 seconds of cold start CPU churning. All of the hosts appear to receive the requests at the same time, as if something was blocking any request from going through and then all of a sudden let everything through.
My problem is, I have been unable to locate any information related to how one can debug HTTP.sys. How can I see what it is doing? What is causing the block? Why is it waiting to forward on the requests to the workers? Why do they all come through together?
Alternatively, if someone has any idea how I can work around this and get the requests to come through immediately (without the waiting) I would very much appreciate it.
Another note: I can see System (PID 4) immediately register to listen on the port I have specified as soon as the hosting applications launch.
Additional Information:
This is what one of my hosting apps looks like under netsh http show servicestate
Server session ID: FD0000012000004C
Version: 2.0
State: Active
Properties:
Max bandwidth: 4294967295
Timeouts:
Entity body timeout (secs): 120
Drain entity body timeout (secs): 120
Request queue timeout (secs): 120
Idle connection timeout (secs): 120
Header wait timeout (secs): 120
Minimum send rate (bytes/sec): 150
URL groups:
URL group ID: FB00000140000018
State: Active
Request queue name: IntegrationTestAppPool10451{974E3BB1-7774-432B-98DB-99850825B023}
Properties:
Max bandwidth: inherited
Max connections: inherited
Timeouts:
Timeout values inherited
Logging information:
Log directory: C:\inetpub\logs\LogFiles\W3SVC1
Log format: 0
Number of registered URLs: 2
Registered URLs:
HTTP://LOCALHOST:10451/
HTTP://*:10451/
Request queue name: IntegrationTestAppPool10451{974E3BB1-7774-432B-98DB-99850825B023}
Version: 2.0
State: Active
Request queue 503 verbosity level: Basic
Max requests: 1000
Number of active processes attached: 1
Controller process ID: 12812
Process IDs:
12812
Answering this mainly for posterity. Turns out that my problem wasn't HTTP.sys but instead it was ASP.NET. It opens up a shared lock when it tries to compile files. This shared lock is identified by System.Web.HttpRuntime.AppDomainAppId. I believe that since all of my apps are built dynamically from a common applicationHost.config file, they all have the same AppDomainAppId (/LM/W3SVC/1/ROOT). This means they all share a lock and effectively all page compilation happens sequentially for all of the apps. However, due to the nature of coming/going from the lock all of the pages tend to finish at the same time because it is unlikely that any of them will get to the end of the process in a timely fashion, causing them all to finish around the same time. Once one of them makes it through, others are likely close behind and finish just after.

Call to slow service over HTTP from within message-driven bean (MDB)

I have a message driven bean which serves messages in a following way:
1. It takes data from incoming message.
2. Calls external service via HTTP (literally, sends GET requests using HttpURLConnection), using the data from step 1. No matter how long the call takes - the message MUST NOT be dropped.
3. Uses the outcome from step 2 to persist data (using entity beans).
Rate of incoming messages is:
I. Low most of the time: an order of units / tens in a day.
II. Sometimes high: order of hundreds in a few minutes.
QUESTION:
Having that service in step (2) is relatively slow (20 seconds per request and degrades upon increasing workload), what is the best way to deal with situation II?
WHAT I TRIED:
1. Letting MDB to wait until service is executed, no matter how long it takes. This tends to rollback MDB transactions by timeout and to re-deliver message, increasing workload and making things even worse.
2. Setting timeout for HttpURLConnection gives some guarantees in terms of completion time of MDB onMessage() method, but leaves an open question: how to proceed with 'timed out' messages.
Any ideas are very much appreciated.
Thank you!
In that case you can just increase a transaction timeout for your message driven beans.
This is what I ended up with (mostly, this is application server configuration):
Relatively short (comparing to transaction timeout) timeout for HTTP call. The
rationale: long-running transactions from my experience tend to
have adverse side effects such as threads which are "hung" from app.
server point of view, or extra attention to database configuration,
etc.I chose 80 seconds as timeout value.
Increased up to several minutes re-delivery interval for failed
messages.
Careful adjustment of the number of threads which handle messages
simultaneously. I balanced this value with throughput of HTTP service.

Resources