Clients connect to my Nginx instance with the keep-alive of 15s. I set worker_shutdown_timeout to 30s, and server keep-alive to 90s.
When I send -HUP signal or using Nginx -s reload to my instance. It creates new workers and immediately shuts down old workers. This causes my clients to get 499 EOFs.
Any idea what am I doing wrong?
Keepalive connections are closed immediately regardless of the worker_shutdown_timeout value, as clients are expected to re-open them as needed. The worker_shutdown_timeout applies to connections with actual requests being processed - these requests will be terminated when shutdown timeout expires.
If your clients cannot handle keepalive connection being closed by the server, probably there is room for improvement in the client code.
Related
I'm seeing a weird situation where either Nginx or uwsgi seems to be building up a long queue of incoming requests, and attempting to process them long after the client connection timed out. I'd like to understand and stop that behavior. Here's more info:
My Setup
My server uses Nginx to pass HTTPS POST requests to uWSGI and Flask via a Unix file socket. I have basically the default configurations on everything.
I have a Python client sending 3 requests per second to that server.
The Problem
After running the client for about 4 hours, the client machine started reporting that all the connections were timing out. (It uses the Python requests library with a 7-second timeout.) About 10 minutes later, the behavior changed: the connections began failing with 502 Bad Gateway.
I powered off the client. But for about 10 minutes AFTER powering off the client, the server-side uWSGI logs showed uWSGI attempting to answer requests from that client! And top showed uWSGI using 100% CPU (25% per worker).
During those 10 minutes, each uwsgi.log entry looked like this:
Thu May 25 07:36:37 2017 - SIGPIPE: writing to a closed pipe/socket/fd (probably the client disconnected) on request /api/polldata (ip 98.210.18.212) !!!
Thu May 25 07:36:37 2017 - uwsgi_response_writev_headers_and_body_do(): Broken pipe [core/writer.c line 296] during POST /api/polldata (98.210.18.212)
IOError: write error
[pid: 34|app: 0|req: 645/12472] 98.210.18.212 () {42 vars in 588 bytes} [Thu May 25 07:36:08 2017] POST /api/polldata => generated 0 bytes in 28345 msecs (HTTP/1.1 200) 2 headers in 0 bytes (0 switches on core 0)
And the Nginx error.log shows a lot of this:
2017/05/25 08:10:29 [error] 36#36: *35037 connect() to unix:/srv/my_server/myproject.sock failed (11: Resource temporarily unavailable) while connecting to upstream, client: 98.210.18.212, server: example.com, request: "POST /api/polldata HTTP/1.1", upstream: "uwsgi://unix:/srv/my_server/myproject.sock:", host: "example.com:5000"
After about 10 minutes the uWSGI activity stops. When I turn the client back on, Nginx happily accepts the POST requests, but uWSGI gives the same "writing to a closed pipe" error on every request, as if it's permanently broken somehow. Restarting the webserver's docker container does not fix the problem, but rebooting the host machine fixes it.
Theories
In the default Nginx -> socket -> uWSGI configuration, is there a long queue of requests with no timeout? I looked in the uWSGI docs and I saw a bunch of configurable timeouts, but all default to around 60 seconds, so I can't understand how I'm seeing 10-minute-old requests being handled. I haven't changed any default timeout settings.
The application uses almost all the 1GB RAM in my small dev server, so I think resource limits may be triggering the behavior.
Either way, I'd like to change my configuration so that requests > 30 seconds old get dropped with a 500 error, rather than getting processed by uWSGI. I'd appreciate any advice on how to do that, and theories on what's happening.
This appears to be an issue downstream on the uWSGI side.
It sounds like your backend code may be faulty in that it takes too long to process the requests, does not implement any sort of rate limiting for the requests, and does not properly catch if any of the underlying connections have been terminated (hence, you're receiving the errors that your code tries to write to closed pipelines, and possibly even start processing new requests long after the underlying connections have been terminated).
As per http://lists.unbit.it/pipermail/uwsgi/2013-February/005362.html, you might want to abort processing within your backend if not uwsgi.is_connected(uwsgi.connection_fd()).
You might want to explore https://uwsgi-docs.readthedocs.io/en/latest/Options.html#harakiri.
As last resort, as per Re: Understanding "proxy_ignore_client_abort" functionality (2014), you might want to change uwsgi_ignore_client_abort from off to on in order to not drop the ongoing uWSGI connections that have already been passed to the upstream (even if the client does subsequently disconnect) in order to not receive the closed pipe errors from uWSGI, as well as to enforce any possible concurrent connection limits within nginx itself (otherwise, the connections to uWSGI will get dropped by nginx should the client disconnect, and nginx would have no clue how many requests are being queued up within uWSGI for subsequent processing).
Seems like DoS attack on Nginx uWSGI returning 100% CPU usage with Nginx 502, 504, 500. IP spoofing is common in DoS attack. Exclude by checking the logs.
Is there a way to keep idle connection open after a specified timeout at the server side?
I have a http server implemented using golang1.4 and an API will respond after 10 seconds. While I set MaxIdleConnsPerHost at the client (using golang1.4), I still get the read tcp xx.xx.xx.xx:xx: use of closed network connection. I think it may be caused by server closing the idle connection.
I found GOLANG, HTTP having “use of closed network connection” error and http.Server: timeout for idle connections only?, but they didn't help.
At the server side, set ReadTimeout may help.
Ref: Creating an idle timeout in Go?
Is my reasoning about HTTP Keep-Alive correct? Here are my assumptions:
In modern browsers, persistent connections are the default, so the Connection: Keep-Alive response header isn't likely to be sent by HAProxy for those clients. Not seeing it does not necessarily mean that the connection is not persistent.
A Keep-Alive is meant to be a short duration, for example a few hundred milliseconds, during which the client can reuse the same connection to the same server (for uses such as downloading images, CSS and JavaScript that are on the same page)
Longer persistence can be accomplished by cookies or stick tables
The reason I ask is because when I set a long keep-alive timeout (via timeout http-keep-alive), typing F5 rapidly loads a different backend server just as fast, rotating through all three without any effect from the keep-alive. I would have thought that if I hit F5 during the keep-alive timeout period of five seconds, I'd still get the same backend server.
Am I thinking about this wrong?
Here is my HAproxy config where I've set the keep-alive timeout to 5 seconds:
defaults
timeout connect 5s
timeout client 120s
timeout server 120s
timeout http-request 5s
timeout http-keep-alive 5s
option http-keep-alive
frontend http
mode http
bind *:80
default_backend servers1
backend servers1
mode http
balance roundrobin
server web1 web1:80 check
server web2 web2:80 check
server web3 web3:80 check
UPDATE
Using Wireshark, it looks like, from the client perspective, the client is reusing the same socket connection to HAProxy's frontend until the timeout expires. So the keep-alive appears to be working between client and frontend. However, I added an image to the Web page to see which backend server would serve it, and if it would be a different server than the one that served the HTML...and it was different. In roundrobin mode, two different backend servers served up content to the same client, even though I had keep-alive enabled.
This all seems to me like keep-alive is not working between frontend and backend. This is the behavior I would expect if I had set option http-server-close, but not when using option http-keep-alive which should keep the connection to the server open.
you should ensure your server does not close the connection.
you should enable the "option prefer-last-server", in your HAProxy's defaults section.
you should re-read the "option http-keep-alive" documentation of HAProxy: http://cbonte.github.io/haproxy-dconv/configuration-1.6.html#option%20http-keep-alive.
It clearly states that:
If the client request has to go to another backend or another server due to content switching or the load balancing algorithm, the idle connection will immediately be closed and a new one re-opened. Option "prefer-last-server" is available to try optimize server selection so that if the server currently attached to an idle connection is usable, it will be used.
(hence point 2)
So your observations looks normal.
I plan to use nginx for proxying websockets. When performing nginx reload / HUP , I understand that nginx waits for the old worker processes to stop processing all requests. In websocket connection however, this may not happen for long time as the connection is persistent. Is there an option / roadmap to forceibly kill old worker process after timeout on reload?
References:
http://nginx.org/en/docs/control.html
http://forum.nginx.org/read.php?21,247573,247651#msg-247651
Thanks
Unless you have either solution: proxy_read_timeout 1d or a ping message to keep connection alive, Nginx closes connections in 60sec otherwise. This default value was chosen by a reason.
See what Nginx core developer says:
There is proxy_read_timeout (http://nginx.org/r/proxy_read_timeout)
which as well applies to WebSocket connections. You have to bump it
if your backend do not send anything for a long time. Alternatively,
you may configure your backend to send websocket ping frames
periodically to reset the timeout (and check if the connection is
still alive).
Having said that nothing should stop you from using USR2+QUIT signals combination that usually used when you gracefully restart Nginx while binary upgrade. Nginx master/worker processes rare consume more than 50MB of memory, so to keep multiple masters isn't that expensive. USR2 helps to fork new master and spawn its workers followed by gracefully shutdown old workers and master.
Since haproxy v1.5.0 it was possible to temporarily stop reverse-proxying traffic to frontends using
set maxconn frontend <frontend_name> 0
command.
I've noticed that if haproxy is configured to maintain keepalive connections between haproxy and a client then said connections will continue be served whereas the new ones will continue awaiting for "un-pausing" a frontend.
The question is: is it possible to terminate current keepalive connections gracefully so that a client was required to establish new connections?
I've only found shutdown session and shutdown sessions commands but they are obviously not graceful at all.
The purpose of all of this is to make some changes on server seamlessly, otherwise in current configuration it would require a scheduled maintenance window.