I was playing with websocket a bit (using Sails.js with its built-in socket thing, which is based on Socket.io).
I noticed Chrome receives two frames every 25 seconds. I thought this was some kind of polling to tell the connection was still on.
But then, I cancelled the server and Chrome was notified immediately.
Also I closed the Node process by force with the kill command, and still Chrome was notified, so that means it wasn't Node sending a signal before shutting down the server.
How does this happen?
Normal TCP socket connections do this, so it'd be surprising if websockets didn't.
The server kernel is responsible for cleaning up when the server process dies/exits/is killed. This includes releasing memory, closing files, and shutting down sockets. Cleanly shutting down a TCP socket requires sending a message to tell the peer.
Interestingly, on some old versions of Windows with userspace winsock, this didn't happen if the server process crashed. On all OS with compliant TCP support, it should be guaranteed unless the kernel itself hangs, the machine loses power, or the network breaks.
Related
As you see above, the tcp connection release so slow.
I'm wondering how it happened and if it affect my program (http layer)?
This is persistent connections that defined by HTTP/1.1. When client makes requests to the server several requests can share one underlying TCP connection.
In your case request was performed and system waits for a while expecting other request. After 30 seconds inactivity it considers connection as idle and close it (sends TCP FIN).
About impact to the system: some resources are consumed for TCP connection handling. This may be an issue on huge servers handling millions requests but I don't think that this is your case.
I am load testing my WebSocket Tornado server, running on Ubuntu Server 14.04.
I am playing with a big client machine loading 60,000 users, 150 a second (that's what my small server can comfortably take). Client is a RedHat machine. When a load test suite finishes, I have to wait a few seconds to be able to rerun.
Within these few seconds, my websocket server is handling closing of the 60,000 connections. I can see it in my graphite dashboard (the server logs every connect and and disconnect information there).
I am also logging relevant outputs of the netstat -s and ss -s commands to my graphite dashboard. When the test suite finishes, I can immediately see tcp established seconds dropping from 60,000 to ~0. Other socket states (closed, timewait, synrecv, orphaned) remain constant, very low. My client's sockets go to timewait for a short period and then this number goes to 0 too. When I immediately rerun the suite, and all the tcp sockets on both ends are free, but the server has not finished processing of the previous closing batch yet, I see no changes on the tcp socket level until the server is finished processing and starts accepting new connections again.
My question is - where is the information about the sockets waiting to be established stored (RedHat and Ubuntu)? No counter/queue length that I am tracking shows this.
Thanks in advance.
I have build a proxy server that can balance between multiple nodes.
I also made it that it can reload with zero downtime. Problem is that most of the nodes have keep alive
connections and i have no clue how to handle these. Sometimes the server cant shutdown off 1 or 2 open connections that wont close.
My first opinion is to set a timeout on the shutdown but that does not secures me that every connection is terminated correctly. I think of a download that takes some minutes to complete.
Anyone can give me some good advise what to do in this case?
One option you have is to initially shutdown just the listening sockets, and wait on the active connections before exiting.
Once you free up the listening sockets, your new process is free to start up and accept new connections. The old process can then continue running until all its connections are closed gracefully (this is how HAProxy does reloads), or until some far longer timeout if you choose.
We currently experience a problem with a self-written server application running on Windows (occurs on different versions). The server listens at a TCP port, accepts connections, exchanges some data and then closes the connections again. There are about 100 clients that connect from time to time.
Sometimes the server stops to work: Log files show that connections are still accepted, but that at the first read attempt a socket error (10054 - Connection reset by peer) occurs. I don't think it is a client issue because it suddenly stops working for all clients.
Now we found out, that the same problem occurs with our old server software, that is even written in another programming language. So it doesn't seem to be an error in our program - I think it has to be some kind of OS / firewall issue? Of course, firewalls have been deactivated, which didn't solve the issue yet.
Any ideas where to look into? Wireshark logs will follow soon..
Excerpt from the log (Timestamp, Thread Id, message)
11:37:56.137 T#3960 Connection from 10.21.13.3
11:37:56.138 T#3960 Client Exception: Socket Error # 10054
Connection reset by peer.
11:37:56.138 T#3960 ClientDisconnected
11:38:00.294 T#4144 Connection from 10.21.13.3
You can see that the exception occurs almost at the same time as the connection is accepted, in this case the client reconnects after a few seconds.
A "stateful" firewall or NAT keeps track of connections, and ought to send RSTs for connectiosn it doesn't know about. If the firewall loses track of connections for some reason, then you'll probably see random connections being reset.
Our router at work does this — it forgets about connections when the PPP connection dies, which is remarkably unhelpful when it rains and the DSL restart takes a bit too long. However, instead of resetting connections, it just drops packets (even more unhelpful!).
Sounds like a firewall or routing issue - maybe stale connections get disconnected after a timeout period. Are you using a ping/keepalive inside your protocol.
Otherwise you may ask Wireshark to see what is going on.
First, thanks for many hints - I'm afraid the problem was a completely different one which you couldn't possibly solve by reading my question.
The server application uses log4net, configured with a log file an ImmediateFlush = true. If every log statement is directly written into the file and multiple socket connections occur this slows down the whole application.
The server needed about a minute to really accept the connection. This was far more than the timeout on clientside. So in the log there was only shown "accepted" followed by "disconnected" - even the log was delayed!
Sorry for the inconvenience...
Have you tried changing the backlog and then see how much time or how many clients are served before this problem occurs
You don't say what Windows versions you're using for the server, but you should be aware that the Windows TCP/IP stack behaves differently in server and client OSes. There are limits on how many simultaneous incoming connections a client OS will allow, and they are significantly lower than you might expect.
What do the logs look like from the client side?
Since the error is stating that the client is dropping the connection; if you see the same error on the client side then it is a firewall or proxy that is dropping the connection (both side seeing the opposite side dropping the connection is indicative of a proxy/firewall).
If the error is not present on the client side; then I would say that your client side is where you will see the actual error.
I always thought that if you didn't implement a heartbeat, there was no way to know if one side of a TCP connection died unexpectedly. If the process was just killed on one side and didn't exit gracefully, there was no way for the socket to send FIN or let the other side know that it was closed.
(See some of the comments here for example http://www.perlmonks.org/?node_id=566568 )
But there is a stock order server that I connect to that has a new "cancel all orders on disconnect feature" that cancels live orders if the client dis-connects. It works even when I kill the process on my end, and there is definitely no heartbeat from my app to it.
So how is it able to detect when I've killed the process? My app is running on Windows Server 2003 and the order server is on Suse Linux Enterprise Server 10. Does Windows detect that the process associated with the socket is no longer alive and send the FIN?
When a process exits - for whatever reason - the OS will close the TCP connections it had open.
There's numerous other ways a TCP connection can go dead undetected
someone yanks out a network cable inbetween.
the computer at the other end gets nuked.
a nat gateway inbetween silently drops the connection
the OS at the other end crashes hard.
the FIN packets gets lost.
Though enabling tcp keepalive, you'll detect it eventually - atleast during a couple of hours.
It could be using a TCP Keep Alive to check for dead peers:
http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html
As far as I know, the OS detects the process termination and closes all the file descriptors/sockets/handles the process was using. So, there isn't difference between "killing" application and "gracefully terminating". Of course, the kernel itself must be running (=pc turned on, wire connected...). But it's on the OS the job of sending the FIN and so on...
Also, if a host becomes unreachable /turned off, disconnected...) an intermediate gateway (or the client itself) may detect the event (e.g. loss of carrier, DHCP lease not renewed...) and reply to the packets sent to the died host with a ICMP error (host/network unreachable). This causes the peer's TCP connection to die, but it happens only if the client has some packet to send to the host.