Pings go through but application sporadically loses connectivity. - tcp

Question
is there a possibility pings have a 100% success rate but other tcp traffic may fail randomly? If so why and how to resolve the issue?
Background
We are binding to a ussd gateway however we sporadically lose connectivity and are forced to rebind. Now it is my understanding that tcp conections are left open unless closed by the applicaion layer or a node between the ussd client closes the connection such as a Nat router or a Firewall render them stale and close them. The counter intuitive part is that pings between the client and ussd have 0% failure rate.
So the question is there a possibility pings never fail but other tcp traffic may fail? If so why?

Yes. They are different protocols and are handled by different software.
IME, the usual reason for such problems is that the TCP server and/or client are buggy.

Related

TCP connection: After a while, server cannot send packets to client. Client can though

I think it relates just to the TCP layer, but I describe my setup in the following paragraph:
On google compute engine I set up a http and websocket server (python, geventwebsocket+gevent.WSGIServer). At home I have my computer (esp8266) that connects to it using websockets.
I use websockets because I need bidirectional communication (a couple of messages a day, it goes like this: a message from server, a response from client.) The connection itself is initiated by the client, as it's behind a NAT.
The problem is that a couple of seconds from the last packet exchange, the messages from server don't arrive to the client. However, the client can send packets to the server even minutes after (and possibly much longer). And interestingly then, the probably retransmitted packets from server finally arrive.
I examined the packets are indeed sent from server with wireshark (and retrasmitted, if not ack'ed) and log every network communication on the client, so the problem probably isn't the application software. I get no exceptions in the applications. The connections are open.
I tested the time server can sent packets after the connection initiation/last delivered packet generally and it's between 6 and 20 seconds, varying between tests. In the test server sends out packets with a set, fixed, delay between them.
In a test (couple of packets) with the single set delay usually either all packets arrive, or none (yeah if one doesn't arrive, the next won't).
I suspect that might be because of the NAT. But then the one solution I see would be to periodically (every 6 seconds or less) send out keep alive packets (Pings and Pongs in websocket, or the TCP's keepalive) from the client. But that doesn't seem elegant, as there should be only a few data messages in a day.
And the similar thing happens when ssh'ing from my desktop to the server: after a couple seconds of inactivity at my and server side, the server stops sending anything (tested e.g. with watch -n20 date. Sometimes it just freezes and doesn't update until I press a key = send a packet from client. But the update is not instant in case of the ssh, it takes a couple of seconds after the keypress to see new stuff. Edit: of course that must be due to the retransmission timer algorithm)
So I studied what is the purpose of TCP keep-alive packets etc. and the thing is that routers and NAT's forget the connections or mappings or whatever in some time/keep only the newest. (So I guess in the case of client->server the mappings just recreate as the destination ip is public and is the actual server. And in the opposite direction it is not possible, so it doesn't work.)
But didn't think it can be as bad as in 6 seconds. The websockets almost reduce to polling (although with a possibly smaller lag).
It seems that the router's NAT mechanism may cause the problem. Maybe you can usee some little tools like NAT-PMP or Upnp to open a port and mapping to your local client. This will last long enough for you to do bidirectional communication.

Should my IoT device poll with HTTP or listen with TCP?

I'm creating an IoT Device + Server system using .NET Micro Framework and ASP.NET WebAPI (Probably in Azure).
The IoT device needs to be able to frequently update the server with stats whilst also being able to receive occasional incoming commands from the server that would change its behaviour. In this sense, the device needs to act as both client and server itself.
My concern is getting the best balance between the security of the device and the load on the server. Furthermore, there must be a relatively low amount of latency between the server needing to issue a command and the device carrying it out; of the order of a few seconds.
As I see it my options are:
Upon connection to the internet, the device establishes a persistent TCP connection to the server which is then used for both polling and receiving commands.
The device listens on a port (e.g. HttpListener) for incoming commands whilst updating the server via frequent HTTP requests.
The device only ever polls the server with HTTP requests. The server uses the response to give the device commands.
The 2nd option seems to be the least secure as the device would have open incoming ports. The 1st option looks the most difficult to reliably implement as it would require low level socket programming. The 3rd option seems easy and secure but due to the latency requirements the device would need to poll every few seconds. This impacts the scalability of the system.
So at what frequency does HTTP polling create more overhead than just constantly keeping a TCP connection open? 5s? 3s? 1s? Or am I overstating the overhead of keeping a TCP connection open in ASP.NET? Or is there a completely different way that this can be implemented?
Thanks.
So at what frequency does HTTP polling create more overhead than just constantly keeping a TCP connection open? 5s? 3s? 1s?
There is nothing to do to keep a TCP connection open. The only thing you might need to do is to use TCP keep-alive (which have nothing to do with HTTP keep-alive!) in case you want to keep the connection idle (i.e no data to send) for a long time.
with HTTP your overhead already starts with the first request, since your data need to be encapsulated into a HTTP message. This overhead can be comparable small if the message is large or it can easily be much larger than the message itself for small messages. Also, HTTP server close the TCP connection after some idle time so you might need to re-establish the TCP connection for the next data exchange which is again overhead and latency.
HTTP has the advantage to pass through most firewalls and proxies, while plain TCP does not. You also get encryption kind of free with HTTPS, i.e. there are established standards for direct encrypted connection and for tunneling through a proxy.
WebSockets is something in between: you do a HTTP request and then upgrade HTTP to WebSocket. The initial overhead is thus as large as for HTTP but for the next messages the overhead is not that much higher than TCP. And you can do also WebSockets with HTTPS (i.e. wss:// instead of ws://). It passes through most simple firewalls and proxies, but more deeper inspection firewalls might still have trouble with it.
Setting up a TCP listener will be a problem if you have your IoT device behind some NAT router, i.e. the usual setup inside private or SoHo networks. To reach the device one would need to open a tunnel at the router from outside into the network, either by administrating the router by hand or with UPnP (which is often switched off for security reasons). So you would introduce too much problems for the average user.
Which means that the thing which the fewest problems for the customer is probably HTTP polling. But this is also the one with the highest overhead. Still mostly compatible are WebSockets which have less overhead and more problems but even less overhead can be reached with simple TCP to the server. TCP listener instead would cause too much trouble.
As for resources on the server side: each HTTP polling request might use new TCP connection but you can also reuse an existing one. In this case you could decide between more overhead and latency one the client side (new TCP connection for each request) which needs few resources on the server side and less overhead and latency on the client side which needs more resources on the server side (multiple HTTP requests per TCP connection). With WebSockets and plain TCP connection you always need more server side resources, unless your client will automatically re-establish the connection on loss of connectivity.
These days you should use a IOT Specific communication protocol over TLS 2.0 for secure light weight connections. For example AWS uses MQTT http://mqtt.org/ and Azure uses AMQP https://www.amqp.org/
The idea is you get a broker you can connect to securely then you use a messaging protocol with a topic to route messages to the proper devices. Also IBM has been using MQTT for a long time and routers now typically come with port 8883 open which is MQTT over TLS.
Good Luck!
Simply use SignalR to connect client and server. It provides you minimal latency without polling. The API is very simple to use.
Physically, this runs over WebSockets which are scalable to a large number of concurrent connections. If you don't have a need for more than 100k per Windows server this would not be a concern.

Requirements for Repeated TCP connects

I am using Winsock, and I have a need to issue a TCP connect repeatedly to a third-party server. These applications will stay up potentially for days at a time. I am the only client connecting to the server. The time between connects is on the order of seconds, and the connection stays up only long enough to send a single message of a few bytes. I am currently seeing that the connects start to fail (WSAECONNREFUSED) after a few hours. Is there anything I must do (e.g. socket options, etc.) to ensure these frequent repeated connects will succeed for an indefinite amount of time? Thanks!
When doing a lot of transaction based connections and having issues with TCP's TIME_WAIT state duration (which last 2MSL = 120 seconds) leading to no more connections available for a client host toward a special server host, you should consider UDP and managing yourself the re-sending of lost requests.
I know that sounds odd. But standard services like DNS are required to use UDP to handle a ton of transactions (request then a single answer in one UDP segment) in order to avoid issues you are experimenting yourself. Web browsers send a request using UDP to the DNS. Re-request is done using UDP after a short time, no longer than a few milliseconds I guess. Sometimes the resolved name is too long and does not fit in the UDP paquet. As a consequence the DNS server send a UDP reply with a dedicated flag raised, in order to ask the client to use TCP this time.
Moreover you may consider also the T/TCP extension (Transactional TCP) of TCP, if available on your Windows platform. It provides TCP reliability with shorter TIME_WAIT state, as nearly no costs in the modifications of your client code. As far as I know it may work even though the server does not handle that extension. As a side note it is currently not used on the internet as it is know to have some flaw...

Can TCP re-transmit a handshake when it’s lost in transport?

I saw a large number of failed connections between two hosts on my intranet (call them client and server).
Using netstat on both machines, I see corresponding port numbers where the server end is in SYN_RECV state and the client is in SYN_SENT.
My interpretation is that the server has responded to the client’s SYN with a SYN,ACK but this packet has been lost. The handshake is disrupted, the socket connection is in an incomplete state, and I see the client time out after 20-45 seconds.
My question is, does TCP offer a way for the server to re-transmit the SYN,ACK after some interval? Is this a good or bad idea?
More system details if relevant: both ends RHEL5, ssh succeeds, ping loses 100%, traceroute succeeds. Client is built on OpenOrb (Java), server is Mico (C++).
SYN and FIN flags are considered part of the sequence space and are transmitted reliably (so, the answer to your immediate question is "yes, it does, by default").
However, I think you really want to dig a bit deeper, because:
If you have a large number of failed connections on the hosts on your intranet, this points to a problem in the network - normally you should have a low, if any, connections that are stuck in these states. Retransmissions would mean your connection will hiccup for 2,4,8,.. seconds (though not necessary - depends on the TCP stack. Nonetheless nothing pretty for the users).
I would advise to run tcpdump or wireshark on both hosts and trace where the loss of the packets happens - and fix it.
On older hardware, a frequent reason could be a duplex mismatch on some pair of the devices in the path (incorrectly autodetected, or incorrectly hardcoded). Some other reasons may be a problem with the driver, or a bad cable (not enough bad to cause complete outage, but bad enough to cause periodic blackouts).

Rejecting a TCP connection before it's being accepted?

There are 3 different accept versions in winsock. Aside from the basic accept which is there for standard compliance, there's also AcceptEx which seems the most advanced version (due to it's overlapped io capabilities), and WSAAccept. The latter supports a condition callback, which as far as I understand, allows the rejection of connection requests before they're accepted (when the SO_CONDITIONAL_ACCEPT option is enabled). None of the other versions supports this functionality.
Since I prefer to use AcceptEx with overlapped io, I wonder how come this functionality is only available in the simpler version?
I don't know enough about the inner workings of TCP to tell wehter there's actually any difference between rejecting a connection before it has been accepted, and disconnecting a socket right after a connection has been established? And if there is, is there any way to mimic the WSAAccept functionality with AcceptEx?
Can someone shed some light over this issue?
When a connection is established, the remote end sends a packet with the SYN flag set. The server answers with a SYN,ACK packet, and after that the remote end sends an ACK packet, which may already contain data.
There are two ways to break a TCP connection from forming. The first is resetting the connection - this is the same as the common "connection refused" message seen when connecting to a port nobody is listening to. In this case, the original SYN packet is answered with a RST packet, which terminates the connection immediately and is stateless. If the SYN is resent, RST will be generated from every received SYN packet.
The second is closing the connection as soon as it has been formed. On the TCP level, there is no way to close the connection both ways immediately - the only thing you can say is that "I am not going to send any more data". This happens so that when the initial SYN, SYN,ACK, ACK exchange has finished, the server sends a FIN packet to the remote end. In most cases, telling the other end with FIN that "I am not going to send any more data" makes the other end close the connection as well, and send it's own FIN packet. A connection terminated this way isn't any different from a normal connection where no data was sent for some reason. This means that the normal state tracking for TCP connections and the lingering close states will persist, just like for normal connections.
Now, on the C API side, this looks a bit different. When calling listen() on a port, the OS starts accepting connections on that port. This means that is starts replying SYN,ACK packets to connections, regardless if the C code has called accept() yet. So, on the TCP side, it makes no difference whether the connection is somehow closed before or after accept. The only additional concern is that a listening socket has a backlog, which means the number of non-accepted connections it can have waiting, before it starts saying RST to the remote end.
However, on windows, the SO_CONDITIONAL_ACCEPT call allows the application to take control of the backlog queue. This means that the server will not answer anything to a SYN packet until the application does something with the connection. This means, that rejecting connections at this level can actually send RST packets to the network without creating state.
So, if you cannot get the SO_CONDITIONAL_ACCEPT functionality enabled somehow on the socket you are using AcceptEx on, it will show up differently to the network. However, not many places actually use the immediate RST functionality, so I would think the requirement for that must mean a very specialized system indeed. For most common use cases, accepting a socket and then closing it is the normal way to behave.
I can't comment on the Windows side of things but as far as TCP is concerned, rejecting a connection is a bit different than disconnecting from it.
For one, disconnecting from a connection means there were more resources already "consumed" (e.g. ports state maintained in Firewalls & end-points, forwarding capacity used in switches/routers etc.) in both the network and the hosts. Rejecting a connection is less resource intensive.

Resources