How nginx reload work ? why it is zero-downtime

How nginx reload work ? why it is zero-downtime - nginx

refer to nginx official docs . the reload command of nginx is for reload of configuration files ，and during the progress , there's no downtime of the service .
i've learned that it wait requests that already connected until it finished ,and stop accept any new request . the idea is cool , but how does it deal with the keep-live connections ? because those long-live connections won't close and there continuous request comes along .

Here's the summary:
http://nginx.org/en/docs/control.html
The master process first checks the syntax validity, then tries to
apply new configuration. If this succeeds, it starts new worker
processes, and sends messages to old worker processes requesting them
to shut down gracefully.
That means it would keep older processes handling unclosed connections while having new processes working according to the updated configuration.
From this perspective connections with keep-alive are no different from other unclosed connections.
In versions prior to 1.11.11 such "old" processes could hang indefinitely long (according to #Alexey, haven't checked it though), from 1.11.11 there’s a configuration setting controlling this
http://nginx.org/en/docs/ngx_core_module.html#worker_shutdown_timeout

Related

handle server shutdown while serving http request

Scenario : The server is in middle of processing a http request and the server shuts down. There are multiple points till where the code has executed. How are such cases typically handled ?. A typical example could be that some downstream http calls had to be made as a part of the incoming http request. How to find whether such calls were made or not made when the shutdown occurred. I assume that its not possible to persist every action in the code flow. Suggestions and views are welcome.

There are two kinds of shutdowns to consider here.
There are graceful shutdowns: when the execution environment politely asks your process to stop (e.g. systemd sends a SIGTERM) and expects it to exit on its own. If your process doesn’t exit within a few seconds, the environment proceeds to kill the process in a more forceful way.
A typical way to handle a graceful shutdown is:
listen for the signal from the environment
when you receive the signal, stop accepting new requests...
...and then wait for all current requests to finish
Exactly how you do this depends on your platform/framework. For instance, Go’s standard net/http library provides a Server.Shutdown method.
In a typical system, most shutdowns will be graceful. For example, when you need to restart your process to deploy a new version of code, you do a graceful shutdown.
There can also be unexpected shutdowns: e.g. when you suddenly lose power or network connectivity (a disconnected server is usually as good as a dead one). Such faults are harder to deal with. There’s an entire body of research dedicated to making distributed systems robust to arbitrary faults. In the simple case, when your server only writes to a single database, you can open a transaction at the beginning of a request and commit it before returning the response. This will guarantee that either all the changes are saved to the database or none of them are. But if you call multiple downstream services as part of one upstream HTTP request, you need to coordinate them, for example, with a saga.
For some applications, it may be OK to ignore unexpected shutdowns and simply deal with any inconsistencies manually if/when they arise. This depends on your application.

JBoss preventing keep-alive when no more thread available

After experimenting with my JBoss 5.1 server I noticed that the HTTP responses contain the Connection: close header if the current thread is the last available one.
For instance if I set maxThreads="4" in the HTTP connector config and perform more than 4 simulatenous requests, then:
the 3 first responses do not contain any Connection header (meaning the connection can be reused by the client for future requests)
all the next requests contain the Connection: close header (meaning the client will have to create a new connection on a different port for the next request)
I could not find any documentation for that. Is this behaviour explained somewhere? And is it possible to avoid it (i.e prevent this Connection: close header) so that clients can reuse the sockets for future requests?

I had a quick look at Tomcat code (on which JbossWeb, the Web container of Jboss is base on).
It shows in the Http11Processor doesn't return from the process method if the connection is allowed to be kept alive. So kept alive connection are using a thread for the HTTP pool while the connection is open.
To prevent the pool to be emptied by non active kept alive connection, the thread pool is most probably (I have spotted some part of the code that may do it in the PooledSender) disabling the possibility to keep the connection open for the last thread in its pool before starting to process the new request. Otherwise it will be too easy to block Tomcat/Jboss by creating a limited number of kept-alive connection.

Golang how to handle gracefull shutdown with keep alives

I have build a proxy server that can balance between multiple nodes.
I also made it that it can reload with zero downtime. Problem is that most of the nodes have keep alive
connections and i have no clue how to handle these. Sometimes the server cant shutdown off 1 or 2 open connections that wont close.
My first opinion is to set a timeout on the shutdown but that does not secures me that every connection is terminated correctly. I think of a download that takes some minutes to complete.
Anyone can give me some good advise what to do in this case?

One option you have is to initially shutdown just the listening sockets, and wait on the active connections before exiting.
Once you free up the listening sockets, your new process is free to start up and accept new connections. The old process can then continue running until all its connections are closed gracefully (this is how HAProxy does reloads), or until some far longer timeout if you choose.

Client Reconnection

My understanding of the (JavaScript) hub client is that if a connection is lost, it enters a 'Reconnecting...' phase which attempts to reconnect. If it can't do so, it will enter a 'Disconnected' state which is where it'll stay until asked to start again.
How long is the 'Reconnecting...' phase meant to last before it gives up? I've read 40 seconds before, but my client seems to take much less time - about 10, maybe less. [EDIT: Nevermind this part, I had configured a 10 disconnect on the server as a test... and forgot. I understand this is set by the server during the negotiate. Makes sense!] ... I'd prefer to have the client continually retry until it is told to abort - can this be done, and would it cause issues?
Another question; during the Reconnecting... phase, if I attempt to call a hub method (again, in JS) it never seems to complete. I'm using the returned Deferred to check for 'done' and 'fail' events, but neither seems to get called. Is this by design?
Thanks.

You can definitely have it continually reconnect.
Handle the disconnected event on the client and call connection.start:
$.connection.hub.disconnected(function() {
setTimeout(function() {
$.connection.hub.start();
}, 5000); // Re-start connection after 5 seconds
});
The only issues this would cause is that you could potentially be triggering infinite requests to a server that isn't there for client machines. This becomes even more troublesome when you introduce the mobile market into the situation (drains battery like crazy).
When you attempt to call a hub method while reconnecting SignalR will try to send your command. Since there are 2 channels, one for receiving data and one for sending, (for all transports except web sockets) in some cases it can still be possible to send requests while your offline. Therefore SignalR does not know if a request fails until the browser tells it that it could not successfully make the request.
Hope this helps!

I might have a clue... Touching the Web.config produces an appPool Recycle, meaning that a new worker process will be created for new requests while the existing process will continue for a while until the remaining requests end or the timeout is reached. Request that do not end in the timeout period are terminated.
Signalr client reconnects to the new process while the long running task is running in the old process, so when on the long running task you do
GlobalHost.ConnectionManager.GetHubContext<ForceHub>();
you actually get a reference for "old" hub while the client is connected to the "new" hub.
That's why the test preformed by Wasp worked: he was making a new request to publish on the signalr hub that was processed in the newly created worker process.
You could try to configure a singalr backplane (https://www.asp.net/signalr/overview/performance/scaleout-in-signalr), it’s really easy to configure it using Sql Server (https://www.asp.net/signalr/overview/performance/scaleout-with-sql-server). The backplane should be capable of connect the two worker processes and hopefully you will get the notification on the client.
If this is the problem, notifications generated by new requests will work even without the backplane. Notice that the real purpose of the backplane is to scale out signalr, this is, to connect a farm of WebServers between them.
Also keep in mind that running long-running task inside IIS is as task hard to achieve as, among other things, IIS does regular appPool recycles and has timeout limits for the requests to execute. I recommend that you read the following post: http://www.hanselman.com/blog/HowToRunBackgroundTasksInASPNET.aspx
“If you think you can just write a background task yourself, it's likely you'll get it wrong. I'm not impugning your skills, I'm just saying it's subtle. Plus, why should you have to?”
Hope this helps

How can Nginx be upgraded without dropping any requests?

According to the Nginx documentation:
If you need to replace nginx binary
with a new one (when upgrading to a
new version or adding/removing server
modules), you can do it without any
service downtime - no incoming
requests will be lost.
My coworker and I were trying to figure out: how does that work?. We know (we think) that:
Only one process can be listening on port 80 at a time
Nginx creates a socket and connects it to port 80
A parent process and any of its children can all bind to the same socket, which is how Nginx can have multiple worker children responding to requests
We also did some experiments with Nginx, like this:
Send a kill -USR2 to the current master process
Repeatedly run ps -ef | grep unicorn to see any unicorn processes, with their own pids and their parent pids
Observe that the new master process is, at first, a child of the old master process, but when the old master process is gone, the new master process has a ppid of 1.
So apparently the new master process can listen to the same socket as the old one while they're both running, because at that time, the new master is a child of the old master. But somehow the new master process can then become... um... nobody's child?
I assume this is standard Unix stuff, but my understanding of processes and ports and sockets is pretty darn fuzzy. Can anybody explain this in better detail? Are any of our assumptions wrong? And is there a book I can read to really grok this stuff?

For specifics: http://www.csc.villanova.edu/~mdamian/Sockets/TcpSockets.htm describes the C library for TCP sockets.
I think the key is that after a process forks while holding a socket file descriptor, the parent and child are both able to call accept() on it.
So here's the flow. Nginx, started normally:
Calls socket() and bind() and listen() to set up a socket, referenced by a file descriptor (integer).
Starts a thread that calls accept() on the file descriptor in a loop to handle incoming connections.
Then Nginx forks. The parent keeps running as usual, but the child immediately execs the new binary. exec() wipes out the old program, memory, and running threads, but inherits open file descriptors: see http://linux.die.net/man/2/execve. I suspect the exec() call passes the number of the open file descriptor as a command line parameter.
The child, started as part of an upgrade:
Reads the open file descriptor's number from the command line.
Starts a thread that calls accept() on the file descriptor in a loop to handle incoming connections.
Tells the parent to drain (stop accept()ing, and finish existing connections), and to die.

I have no idea how nginx does it, but basically, it could just exec the new binary, carrying the listening socket with it the new process (actually, it remains the same process, it just replaces the program executing in it). The listening socket has a backlog of incoming connections, and as long as it's fast enough to boot up, it should be able to start processing them before it overflows. If not, it could probably fork first, exec, and wait for it to boot up to the point where it's ready to process incoming requests, then hand over the command of the listening socket (file descriptors are inherited when forking, both have access to it) via some internal mechanism, before exiting. Noting your observations, this looks like what it's doing (if your parent process dies, your ppid is reassigned to init, i.e. pid 1)
If it has multiple processes competing to accept on the same listening socket (again, I have no idea how nginx does it, perhaps it has a dispatching process?), then you could replace them one by one, by ordering them to exec the new program, as above, but one at a time, as to never drop the ball. Note that during such a process there would never be any new pids or parent/child relationship changes.
At least, I think that's probably how I would do it, off the top of my head.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex