SMSC has many connections with the client, but the client has only one single connection - tcp

First off, there a similar question with the same issue here, but there is no answer, so I rewrote the question once again in more detail.
I am connected to an SMSC, and I noticed that there are a lot of messages are not delivered to us, we asked the SMSC to check the routing and it was fine, but SMSC noticed that there are too many connections established from your side to his side, although, we have one single connection only.
I was using NowSMS SMPP Client application to handle the connectivity, then, the SMSC asked me to change the application although I was thinking that NowSMS had no issues as I am using it 7 years ago, however, I asked NowSMS's team to investigate by opening a support ticket.
Later, I had to change NowSMS and install Kannel on a new Linux machine, after getting connected over Kannel to the SMSC, we got the same issue once again, and when I read all Kannel's logs, I found "System error (104): Connection reset by peer" which makes me, logically, to open a new connection with the SMSC. Accordingly, I suggested to have a live TCP trace from both sides at the same time, and I found the below packet in Wireshark trace file:
As you see, this is a RST/ACK from SMSC to me without requesting RST or anything from my side, and when I asked them why do you send RST/ACK or why do you RST the connection, I didn't get any useful answere, but they told me to read more about the RST/ACK and RST and I have no idea about networking, but when I read, I found that I had no control on RST connection as there was no requests from my side to the SMSC asking for the same. They always guid me to this post and what I see that it doesn't belong to me.
NOW: I just need to know what should I do or what should I ask whom about? As, I asked the Data Center's team about the same, and they confirmed that the VPN between me and the SMSC works normally without any exceptions. I believe, that there is no issue in application layer, but I cannot recognize the root of the issue.
P.S. Kannel's log file, and both TCP Trace file are here

Ask them to activate the Enquire link packet in order to drop inactive connections. It's clearly a problem from their side.

Related

What are the general rules for getting the 104 "Connection reset by peer" error?

Are there any general rules on when a website sends out a TCP reset, triggering the Connection reset by peer error?
Like
too many open connections
too high bandwidth use
connected for too long
…?
I'm pretty certain that there is no law governing this and that different websites/web developers have different tastes, but I would be interested if there are some general rule sets (from websites or textbooks on the subject or what you have been taught in school/at work) that are mostly followed.
Reason why I'm asking, of course, is that I want to get around being blocked…
I'm downloading some government data that is freely available, but is lacking an API or something, so the two official ways to get it are either clicking around in some web-GIS a few thousand times or going along the Kafkaesque path of explaining various levels of clerks the concepts of databases, csv files, zip files and that you can't (and won't need to, if they'd just did what you try to explain them) just drive to their agency with a "giant" harddrive, so I'm trying to just go the most resource saving way for everyone involved…
A website is not "sending" a "Connection reset by peer" error. This error is generated by the OS kernel on the client site if it gets a TCP reset for an active connection. There are many reasons this TCP reset might be sent. A TCP reset might be sent by design from some kind of load limit, for example to limit the number of connections from the same IP address within a specific time as a form of DOS protection, to restrict data scraping or to enforce some kind of fair use. There is no general rule or even law for this kind of explicit limits.
A TCP reset might also be caused by the application being overloaded, application crashing, system running out of resources ... .
And a TCP reset will happen if the client writes to a connection which the server already considers as closed. This can happen for example with HTTP keep alive: the server might close the connection on inactivity at any time after the HTTP response was sent. If the client sends a new request on the same connection at the same time the server closes the connection, the server will reject the new request (since the connection is closed on the server end) and will send a TCP RST, causing a connection reset by peer at the client. The client needs to properly handle this situation by creating a new connection and sending the request again (provided that the request was not state changing, i.e. is idempotent).

Why doesn't using UDP for video-on-demand cause cross-talk?

While reading one of the assignment questions in "Data Communication and Networking" by Behrouz Forouzan, one of the questions asked were using UDP for file-transfer have any adverse effects keeping process crash phenomenon in mind.
The solution to this said that if a process A asked for the file-contents from a server X and soon after the request, A crashed and another process B came up on the same port on the same machine(giving it the same socket address) and sends a request to the same server for another file but the request is lost which makes the server unknown of both the process A crashing and the request being lost and hence, it sends the contents of the file asked by A to B.
Why doesn't this problem occur, in a video-on-demand channel like you-tube or likes?
One of the closest answers I got is this, but it doesn't seem to address my problem:
When is it appropriate to use UDP instead of TCP?
UPDATE: For people who would like to have a read of the question given in the book, I found an online version of the required part, please have a look at the 8th question of the PDF:
http://ceng334.cankaya.edu.tr/uploads/files/file/network%20sample.pdf
In theory the problem could happen but in real life? Not a chance.
Let's say a user wants to stream a video from Youtube with a browser.
Browser must crash - realistically does not happen too often.
New browser instance takes the exact same source UDP port - virtually never happens.
The user decides to look at a different video - makes no sense.
While all this happens, server side does not time out - I don't think so.
This is like arguing that TCP should be used because a packet might get dropped on the wire when two computers are connected back to back with one meter Ethernet cable.

ZeroMQ Client Lose Connection

I have a client (PULL) connect to the server (PUSH). At first they work just fine. But later the connection is broken, and client-side ZeroMQ doesn't try to reconnect to server.
One mysterious thing is that if I do netstat in client side and server side, the client side shows the connection is still ESTABLISHED, while the server side doesn't have the corresponding entry. I suppose this is the reason why client-side doesn't do reconnecting.
PS: client and server are in differenct IDC, and there is a band limit between them. But when the disconnection happens, our monitor shows it does not hit the band limit.
And, when I do netstat in server side (when the connection is fine), sometimes the Send-Q column is very big, and then drop down to 0.
That's all the information I have. If you need more details please tell me.
I realize this is a very old question, but I ran into almost the exact same issue and found this while trying to find a fix. I believe I have fixed my issue so hopefully this helps someone at some point.
I had the same scenario but with ROUTER -> ROUTER. Everything worked great at first but after ~15 minutes of not sending any messages, messages would no longer make it. Then I found: http://api.zeromq.org/3-2:zmq-setsockopt. The three socket options that worked for me were ( using pyzmq ):
# self.client is my socket here
self.client.setsockopt(zmq.TCP_KEEPALIVE, 1)
self.client.setsockopt(zmq.TCP_KEEPALIVE_IDLE, 300)
self.client.setsockopt(zmq.TCP_KEEPALIVE_INTVL, 300)
These override the OS settings and I'm no longer seeing the connection timeout or drop.

TcpListener stops accepting or accepts broken connections

We currently experience a problem with a self-written server application running on Windows (occurs on different versions). The server listens at a TCP port, accepts connections, exchanges some data and then closes the connections again. There are about 100 clients that connect from time to time.
Sometimes the server stops to work: Log files show that connections are still accepted, but that at the first read attempt a socket error (10054 - Connection reset by peer) occurs. I don't think it is a client issue because it suddenly stops working for all clients.
Now we found out, that the same problem occurs with our old server software, that is even written in another programming language. So it doesn't seem to be an error in our program - I think it has to be some kind of OS / firewall issue? Of course, firewalls have been deactivated, which didn't solve the issue yet.
Any ideas where to look into? Wireshark logs will follow soon..
Excerpt from the log (Timestamp, Thread Id, message)
11:37:56.137 T#3960 Connection from 10.21.13.3
11:37:56.138 T#3960 Client Exception: Socket Error # 10054
Connection reset by peer.
11:37:56.138 T#3960 ClientDisconnected
11:38:00.294 T#4144 Connection from 10.21.13.3
You can see that the exception occurs almost at the same time as the connection is accepted, in this case the client reconnects after a few seconds.
A "stateful" firewall or NAT keeps track of connections, and ought to send RSTs for connectiosn it doesn't know about. If the firewall loses track of connections for some reason, then you'll probably see random connections being reset.
Our router at work does this — it forgets about connections when the PPP connection dies, which is remarkably unhelpful when it rains and the DSL restart takes a bit too long. However, instead of resetting connections, it just drops packets (even more unhelpful!).
Sounds like a firewall or routing issue - maybe stale connections get disconnected after a timeout period. Are you using a ping/keepalive inside your protocol.
Otherwise you may ask Wireshark to see what is going on.
First, thanks for many hints - I'm afraid the problem was a completely different one which you couldn't possibly solve by reading my question.
The server application uses log4net, configured with a log file an ImmediateFlush = true. If every log statement is directly written into the file and multiple socket connections occur this slows down the whole application.
The server needed about a minute to really accept the connection. This was far more than the timeout on clientside. So in the log there was only shown "accepted" followed by "disconnected" - even the log was delayed!
Sorry for the inconvenience...
Have you tried changing the backlog and then see how much time or how many clients are served before this problem occurs
You don't say what Windows versions you're using for the server, but you should be aware that the Windows TCP/IP stack behaves differently in server and client OSes. There are limits on how many simultaneous incoming connections a client OS will allow, and they are significantly lower than you might expect.
What do the logs look like from the client side?
Since the error is stating that the client is dropping the connection; if you see the same error on the client side then it is a firewall or proxy that is dropping the connection (both side seeing the opposite side dropping the connection is indicative of a proxy/firewall).
If the error is not present on the client side; then I would say that your client side is where you will see the actual error.

Delay before sending message over socket - how does that help?

I have a tcpip socket interface to a third party software app. I've implemented this interface for several customer sites with no problem. The latest customer, though... problems. We've turned on logging in the apps on either end, and also installed Wireshark on the PC to log raw tcpip traffic. With that, we've proved that my server app successfully sends the message out, the pc receives the message, but the client app doesn't see it. (This is a totally intermittent problem, which is why it's such a pain to troubleshoot.)
The socket details are as simple as they come: one socket handling two way communications between the server and the pc. The messages are plain ascii text and fairly short (not XML). The server initiates communications by sending the first message, and then the client responds with several messages. The socket is kept open at all times while the apps are running. The client app is designed so that the end user can only process one case at a time, which prevents message collisions from happening. They have some sort of polling set up, their app "hibernates" until it sees the initiating message from the server.
The third party vendor has advised me to add a few second delay before I send them the initiating message. I can't see how that helps. If the client is "sleeping", just polling the socket waiting for a message, how does adding a delay before the first message help? It's not like we send two messages and the second one gets lost. It's losing the first message. So I don't see how it matters if we send that message now or two seconds from now.
I've asked them and they haven't given me details. It could be some proprietary details in their coding that they don't want to disclose to me, and that's fair. So I'm asking here because I'm always learning new things about socket programming. Maybe you guys can shed some light on how polling a tcpip socket can be affected by message timing?
Since its someone else's client and they won't tell you what its doing (other than saying 'insert a delay'), the answer is probably that their client is reading and discarding the message because its not yet in a state to deal with it. The delay will allow the client time to get into a state where it can respond to the message properly.
In other words, the client has a race condition. One easy way this can happen is if they have one thread for reading messages and another for dealing with them.
Short of running strace(1) on the client to see what system calls it is making, its tough to tell what the client is actually doing.

Resources