ZeroMQ Client Lose Connection - tcp

I have a client (PULL) connect to the server (PUSH). At first they work just fine. But later the connection is broken, and client-side ZeroMQ doesn't try to reconnect to server.
One mysterious thing is that if I do netstat in client side and server side, the client side shows the connection is still ESTABLISHED, while the server side doesn't have the corresponding entry. I suppose this is the reason why client-side doesn't do reconnecting.
PS: client and server are in differenct IDC, and there is a band limit between them. But when the disconnection happens, our monitor shows it does not hit the band limit.
And, when I do netstat in server side (when the connection is fine), sometimes the Send-Q column is very big, and then drop down to 0.
That's all the information I have. If you need more details please tell me.

I realize this is a very old question, but I ran into almost the exact same issue and found this while trying to find a fix. I believe I have fixed my issue so hopefully this helps someone at some point.
I had the same scenario but with ROUTER -> ROUTER. Everything worked great at first but after ~15 minutes of not sending any messages, messages would no longer make it. Then I found: http://api.zeromq.org/3-2:zmq-setsockopt. The three socket options that worked for me were ( using pyzmq ):
# self.client is my socket here
self.client.setsockopt(zmq.TCP_KEEPALIVE, 1)
self.client.setsockopt(zmq.TCP_KEEPALIVE_IDLE, 300)
self.client.setsockopt(zmq.TCP_KEEPALIVE_INTVL, 300)
These override the OS settings and I'm no longer seeing the connection timeout or drop.

Related

SMSC has many connections with the client, but the client has only one single connection

First off, there a similar question with the same issue here, but there is no answer, so I rewrote the question once again in more detail.
I am connected to an SMSC, and I noticed that there are a lot of messages are not delivered to us, we asked the SMSC to check the routing and it was fine, but SMSC noticed that there are too many connections established from your side to his side, although, we have one single connection only.
I was using NowSMS SMPP Client application to handle the connectivity, then, the SMSC asked me to change the application although I was thinking that NowSMS had no issues as I am using it 7 years ago, however, I asked NowSMS's team to investigate by opening a support ticket.
Later, I had to change NowSMS and install Kannel on a new Linux machine, after getting connected over Kannel to the SMSC, we got the same issue once again, and when I read all Kannel's logs, I found "System error (104): Connection reset by peer" which makes me, logically, to open a new connection with the SMSC. Accordingly, I suggested to have a live TCP trace from both sides at the same time, and I found the below packet in Wireshark trace file:
As you see, this is a RST/ACK from SMSC to me without requesting RST or anything from my side, and when I asked them why do you send RST/ACK or why do you RST the connection, I didn't get any useful answere, but they told me to read more about the RST/ACK and RST and I have no idea about networking, but when I read, I found that I had no control on RST connection as there was no requests from my side to the SMSC asking for the same. They always guid me to this post and what I see that it doesn't belong to me.
NOW: I just need to know what should I do or what should I ask whom about? As, I asked the Data Center's team about the same, and they confirmed that the VPN between me and the SMSC works normally without any exceptions. I believe, that there is no issue in application layer, but I cannot recognize the root of the issue.
P.S. Kannel's log file, and both TCP Trace file are here
Ask them to activate the Enquire link packet in order to drop inactive connections. It's clearly a problem from their side.

how to close and reopen TCP connection in QuickFix

Using QuickFixN, if I restart my trading application I occasionally am unable to logon, getting a "an existing connection was forcibly closed by the remote host" error.
The QuickFix engine retries to connect every 30secs, but always gets the same error.
If I close my application and re-open, it will connect correctly.
Speaking to my broker, it seems that they are rejecting my logins because they did not recognize my connection as being closed first time. 2nd time around, me forcing the application to close will tear-down the TCP connection, meaning that 3rd time logins work.
So my question is: is there a way to close and re-open the TCP connection without restarting the application?
Sounds like the problem is kinda on their end. Since the problem happens when you don't formally log out (e.g crash or abnormal termination), that means that their implementation apparently doesn't recognize the TCP termination.
At a higher-than-TCP layer, their FIX engine should somewhat compensate. If a few heartbeat durations occur after your disconnect, their implementation should realize you're not there anymore, since you're not responding to heartbeats.
So, neither their low-layer TCP handlers nor their FIX engine are able to set the right flag somewhere in their system that says you've gone offline. That's weird. I don't see what you can do about that, aside from intentionally doing a startup/shutdown to kludge their state flag for you.
I'm usually really hesitant to blame the other side (especially because I run the QF/n project), but that's where I'm at with the information provided.

FMS server died daily while enable rtmfp

I have a fms 3 server runing a video chat room application. It goes well except everyday it will die once or twice. After restarting the fms server, everything goes working again.
I really need to know the reason why fms server can die.
I checked its log, i saw many
"Server rejected an invalid flow."
Any hint will be greatest welcome.
This error can be caused by making an attempt to make a P2P connection to the server's peer ID. Connections to the server need to use
http://forums.adobe.com/thread/845685
i believe the problem is that you are attempting to make a P2P connection to the server's peer ID; that is, something like
var ns:NetStream = new NetStream(netConnection, netConnection.farID);
ns.play(...);
under the covers, this will open a new RTMFP flow to the server that will appear to the server as a new incoming client, but the initial handshake will be incorrect (the first/only command message is "play" instead of "connect"). i see this on Cirrus all the time.
it's possible that FMS doesn't account properly when rejecting these flows (leaving the connection count higher than it should be), or perhaps it leaves the flow open waiting for a "connect" message that will never come, so the connection count is legitimately higher than you think it is.
in any case, make sure you're not opening a P2P stream to the server's peer ID.
However, this error may not actually be related to the crashes. Additionally, are you even sure FMS is crashing and not just your application? If it's just your application, review your application logs (instead of the core FMS logs) and if you don't have anything useful add more logging to your application.

TcpListener stops accepting or accepts broken connections

We currently experience a problem with a self-written server application running on Windows (occurs on different versions). The server listens at a TCP port, accepts connections, exchanges some data and then closes the connections again. There are about 100 clients that connect from time to time.
Sometimes the server stops to work: Log files show that connections are still accepted, but that at the first read attempt a socket error (10054 - Connection reset by peer) occurs. I don't think it is a client issue because it suddenly stops working for all clients.
Now we found out, that the same problem occurs with our old server software, that is even written in another programming language. So it doesn't seem to be an error in our program - I think it has to be some kind of OS / firewall issue? Of course, firewalls have been deactivated, which didn't solve the issue yet.
Any ideas where to look into? Wireshark logs will follow soon..
Excerpt from the log (Timestamp, Thread Id, message)
11:37:56.137 T#3960 Connection from 10.21.13.3
11:37:56.138 T#3960 Client Exception: Socket Error # 10054
Connection reset by peer.
11:37:56.138 T#3960 ClientDisconnected
11:38:00.294 T#4144 Connection from 10.21.13.3
You can see that the exception occurs almost at the same time as the connection is accepted, in this case the client reconnects after a few seconds.
A "stateful" firewall or NAT keeps track of connections, and ought to send RSTs for connectiosn it doesn't know about. If the firewall loses track of connections for some reason, then you'll probably see random connections being reset.
Our router at work does this — it forgets about connections when the PPP connection dies, which is remarkably unhelpful when it rains and the DSL restart takes a bit too long. However, instead of resetting connections, it just drops packets (even more unhelpful!).
Sounds like a firewall or routing issue - maybe stale connections get disconnected after a timeout period. Are you using a ping/keepalive inside your protocol.
Otherwise you may ask Wireshark to see what is going on.
First, thanks for many hints - I'm afraid the problem was a completely different one which you couldn't possibly solve by reading my question.
The server application uses log4net, configured with a log file an ImmediateFlush = true. If every log statement is directly written into the file and multiple socket connections occur this slows down the whole application.
The server needed about a minute to really accept the connection. This was far more than the timeout on clientside. So in the log there was only shown "accepted" followed by "disconnected" - even the log was delayed!
Sorry for the inconvenience...
Have you tried changing the backlog and then see how much time or how many clients are served before this problem occurs
You don't say what Windows versions you're using for the server, but you should be aware that the Windows TCP/IP stack behaves differently in server and client OSes. There are limits on how many simultaneous incoming connections a client OS will allow, and they are significantly lower than you might expect.
What do the logs look like from the client side?
Since the error is stating that the client is dropping the connection; if you see the same error on the client side then it is a firewall or proxy that is dropping the connection (both side seeing the opposite side dropping the connection is indicative of a proxy/firewall).
If the error is not present on the client side; then I would say that your client side is where you will see the actual error.

How to detect if a client has crashed (or exit) for a server using Qt

The client use ssh login and start up a server on remote machine, then the clinet create a tcp connect to the server.
The server need exit when the client has exit normally or crashed or network is dropped.
So the question is how to detect if the client which the server has connected to is crashed.
The first try is using error() signal, catch QAbsoluteSocket::NetworkError to determine the network has dropped. But I can't receive error() signal at all even if i pull out the network cable.
The second try is using the SocketState, i think whenever SocketState is UnconnectedState,the client may has exit normally and the server should exit too. This way works fine for "normal exit", but I don't know how to deal with "crash" and "dead network".
Help me, thanks!
I'd recommend using TCP keep alive. It is not exposed through the public QTcpSocket interface, but you can use setsockopt with QAbstractSocker::socketDescriptor to activate the SO_KEEPALIVE feature.
EDIT: It appears that keep alive was added to QAbstractSocket at some point. So, simply call QAbstractSocket::setSocketOption with QAbstractSocket::KeepAliveOption.
You can find information about adjusting the timeout of keep alive request here: http://www.gnugk.org/keepalive.html
Most of the time, the only way you will know there is a problem with a socket connection is when you try to read or write with it. There are some exceptions: Windows will change the state of sockets if the network cable is unplugged, Linux (in my experience) will not.
The most reliable way to detect connection problems is to have the client regularly send a small message at an agreed upon interval with the server. If the server does not see this message within a reasonable time, it should consider the client dead and drop the connection. This will also give both sides regular opportunities to detect a problem via reads and writes.

Resources