Does nginx have a soft quit? - nginx

Does anyone know if nginx supports soft quits? Meaning does it stay running until all connections are either gone or timed out (past a specific time interval) and also not allow new connections during this time period?
For example:
nginx stop
nginx running (2 connections active and blocking any new connections)
nginx running (1 connection active)
nginx stopped (0 connections active)

man nginx
-s signal Send signal to the master process. The argument signal can be
one of: stop, quit, reopen, reload.
The following table shows the corresponding system signals.
stop SIGTERM
quit SIGQUIT
reopen SIGUSR1
reload SIGHUP
Specifically, you want SIGQUIT. In layperson's terms:
stop — fast shutdown
quit — graceful shutdown
reload — reloading the configuration file
reopen — reopening the log files
See also: http://nginx.org/en/docs/control.html for details, and http://nginx.org/en/docs/beginners_guide.html#control for a quick reference.

Related

How to find root cause for server shut down frequently

I'm on ubuntu apache. Lately it shuts down frequently and back on after a restart. I kept analyzing apache2 error log to find the cause. Previously it was reporting PHP code error. But after fixing it, it throws different error now.
What can I conclude based on these errors? Which probably would have caused the downtime and how to fix it?
AH: scoreboard is full, not at MaxRequestWorkers.Increase ServerLimit.
AH00484: server reached MaxRequestWorkers setting, consider raising the MaxRequestWorkers setting
AH00687: Negotiation: discovered file(s) matching request: /opt/bitnami/apps/file_that_doesn't-exist
(70007)The timeout specified has expired: [client 148.251.79.134:60170] AH01075: Error dispatching request to : (polling)
AH00045: child process 5062 still did not exit, sending a SIGTERM
AH00046: child process 5299 still did not exit, sending a SIGKILL
AH01909: localhost:443:0 server certificate does NOT include an ID which matches the server name
I've done enough google search to understand each of this error. Most importantly I woudl like to kow which of these error would have cause the server to go down? And what is the way fixing it?
Bitnami Engineer here,
AH: scoreboard is full, not at MaxRequestWorkers.Increase ServerLimit.
AH00484: server reached MaxRequestWorkers setting, consider raising the MaxRequestWorkers setting
It seems the server is reaching the limits and that can be the reason of the issues you are running into. It can reach the limits either because the instance is really small and you need to increase its memory/CPU or because you are being attacked.
You can check if you are being attacked by running these commands
cd /opt/bitnami/apache2/logs/
tail -n 10000 access_log | awk '{print $1}'| sort| uniq -c| sort -nr| head -n 10
Are those IPs familiar? Is there any IP that is requesting your site too many times?
You can find more information about how to block it here
You can also increase the MaxRequestWorkers parameter in Apache by editing the /opt/bitnami/apache2/conf/bitnami/httpd.conf file or you can also increase the instance type using the AWS console so the server has more resources from now on.

How nginx reload work ? why it is zero-downtime

refer to nginx official docs . the reload command of nginx is for reload of configuration files ,and during the progress , there's no downtime of the service .
i've learned that it wait requests that already connected until it finished ,and stop accept any new request . the idea is cool , but how does it deal with the keep-live connections ? because those long-live connections won't close and there continuous request comes along .
Here's the summary:
http://nginx.org/en/docs/control.html
The master process first checks the syntax validity, then tries to
apply new configuration. If this succeeds, it starts new worker
processes, and sends messages to old worker processes requesting them
to shut down gracefully.
That means it would keep older processes handling unclosed connections while having new processes working according to the updated configuration.
From this perspective connections with keep-alive are no different from other unclosed connections.
In versions prior to 1.11.11 such "old" processes could hang indefinitely long (according to #Alexey, haven't checked it though), from 1.11.11 there’s a configuration setting controlling this
http://nginx.org/en/docs/ngx_core_module.html#worker_shutdown_timeout

What happens to a waiting WebSocket connection on a TCP level when server is busy (blocked)

I am load testing my WebSocket Tornado server, running on Ubuntu Server 14.04.
I am playing with a big client machine loading 60,000 users, 150 a second (that's what my small server can comfortably take). Client is a RedHat machine. When a load test suite finishes, I have to wait a few seconds to be able to rerun.
Within these few seconds, my websocket server is handling closing of the 60,000 connections. I can see it in my graphite dashboard (the server logs every connect and and disconnect information there).
I am also logging relevant outputs of the netstat -s and ss -s commands to my graphite dashboard. When the test suite finishes, I can immediately see tcp established seconds dropping from 60,000 to ~0. Other socket states (closed, timewait, synrecv, orphaned) remain constant, very low. My client's sockets go to timewait for a short period and then this number goes to 0 too. When I immediately rerun the suite, and all the tcp sockets on both ends are free, but the server has not finished processing of the previous closing batch yet, I see no changes on the tcp socket level until the server is finished processing and starts accepting new connections again.
My question is - where is the information about the sockets waiting to be established stored (RedHat and Ubuntu)? No counter/queue length that I am tracking shows this.
Thanks in advance.

Golang how to handle gracefull shutdown with keep alives

I have build a proxy server that can balance between multiple nodes.
I also made it that it can reload with zero downtime. Problem is that most of the nodes have keep alive
connections and i have no clue how to handle these. Sometimes the server cant shutdown off 1 or 2 open connections that wont close.
My first opinion is to set a timeout on the shutdown but that does not secures me that every connection is terminated correctly. I think of a download that takes some minutes to complete.
Anyone can give me some good advise what to do in this case?
One option you have is to initially shutdown just the listening sockets, and wait on the active connections before exiting.
Once you free up the listening sockets, your new process is free to start up and accept new connections. The old process can then continue running until all its connections are closed gracefully (this is how HAProxy does reloads), or until some far longer timeout if you choose.

How can Nginx be upgraded without dropping any requests?

According to the Nginx documentation:
If you need to replace nginx binary
with a new one (when upgrading to a
new version or adding/removing server
modules), you can do it without any
service downtime - no incoming
requests will be lost.
My coworker and I were trying to figure out: how does that work?. We know (we think) that:
Only one process can be listening on port 80 at a time
Nginx creates a socket and connects it to port 80
A parent process and any of its children can all bind to the same socket, which is how Nginx can have multiple worker children responding to requests
We also did some experiments with Nginx, like this:
Send a kill -USR2 to the current master process
Repeatedly run ps -ef | grep unicorn to see any unicorn processes, with their own pids and their parent pids
Observe that the new master process is, at first, a child of the old master process, but when the old master process is gone, the new master process has a ppid of 1.
So apparently the new master process can listen to the same socket as the old one while they're both running, because at that time, the new master is a child of the old master. But somehow the new master process can then become... um... nobody's child?
I assume this is standard Unix stuff, but my understanding of processes and ports and sockets is pretty darn fuzzy. Can anybody explain this in better detail? Are any of our assumptions wrong? And is there a book I can read to really grok this stuff?
For specifics: http://www.csc.villanova.edu/~mdamian/Sockets/TcpSockets.htm describes the C library for TCP sockets.
I think the key is that after a process forks while holding a socket file descriptor, the parent and child are both able to call accept() on it.
So here's the flow. Nginx, started normally:
Calls socket() and bind() and listen() to set up a socket, referenced by a file descriptor (integer).
Starts a thread that calls accept() on the file descriptor in a loop to handle incoming connections.
Then Nginx forks. The parent keeps running as usual, but the child immediately execs the new binary. exec() wipes out the old program, memory, and running threads, but inherits open file descriptors: see http://linux.die.net/man/2/execve. I suspect the exec() call passes the number of the open file descriptor as a command line parameter.
The child, started as part of an upgrade:
Reads the open file descriptor's number from the command line.
Starts a thread that calls accept() on the file descriptor in a loop to handle incoming connections.
Tells the parent to drain (stop accept()ing, and finish existing connections), and to die.
I have no idea how nginx does it, but basically, it could just exec the new binary, carrying the listening socket with it the new process (actually, it remains the same process, it just replaces the program executing in it). The listening socket has a backlog of incoming connections, and as long as it's fast enough to boot up, it should be able to start processing them before it overflows. If not, it could probably fork first, exec, and wait for it to boot up to the point where it's ready to process incoming requests, then hand over the command of the listening socket (file descriptors are inherited when forking, both have access to it) via some internal mechanism, before exiting. Noting your observations, this looks like what it's doing (if your parent process dies, your ppid is reassigned to init, i.e. pid 1)
If it has multiple processes competing to accept on the same listening socket (again, I have no idea how nginx does it, perhaps it has a dispatching process?), then you could replace them one by one, by ordering them to exec the new program, as above, but one at a time, as to never drop the ball. Note that during such a process there would never be any new pids or parent/child relationship changes.
At least, I think that's probably how I would do it, off the top of my head.

Resources