How to Reduce TCP delays caused by ARP flushes for MODBUS TCP - tcp

We have an application which is periodically sending TCP messages at a defined rate(Using MODBUS TCP). If a message is not received within a set period an alarm is raised. However every once in a while there appears to be a delay in messages being received. Investigation has shown that this is associated with the ARP cache being refreshed causing a resend of the TCP message.
The IP stack provider have told us that this is the expected behaviour for TCP. The questions are,
Is this expected behaviour for an IP stack? If not how do other stacks work around the period when IP/MAC address translation is not available
If this is the expected behaviour how can we reduce the delay in TCP messages during this period?(Permanent ARP entries have been tried, but are not the best solution)

In my last job I worked with a company building routers and switches. Our implementation would queue packets waiting for ARP replies and send them when the ARP reply was received. Therefore, no TCP retransmit required.
Retransmission in TCP occurs when an ACK is not received within a given time. If the ARP reply takes a long time, or is itself lost, you might be getting a retransmission even though the device waiting for the ARP reply is queuing the packet.
It would appear from your question that the period of the TCP message is shorter than the ARP refresh time. This implies that reuse of the ARP is not causing it to stay refreshed, which is possible behaviour that would be helpful in your situation.
A packet trace of the situation occurring could be helpful - are you actually losing the first packet? How long does the ARP reply take?
In order to stop the ARP cache timing out, you might want to try to find something that will refresh it, such as another ARP request for the same address, or a gratuitous ARP.
I found a specification for MODBUS TCP but it didn't help. Can you post some details of your network - media, devices, speeds?

Your description suggests that the peer ARP entries expire between TCP segments and cause some subsequent segments to fail due to the lack of a current MAC destination.
If you have the MODBUS devices on a separate subnet, then perhaps the destination router will be kind enough to queue the segment until it receives a valid MAC. If you cannot use a separate subnet, you could try to force the session to have keep-alives activated - this would cause a periodic empty message to be sent that would keep the ARP timers resetting. If the overhead of the keep-alive is too high and you completely control the application in your system, you could try to force zero-length messages through to the peer.

Related

Why do XMPP messages sometimes get lost on mobile devices

This question asks what to do about loosing XMPP messages on mobile devices when they don't have a stable connection, but I don't really see why the packages get lost in the first place.
I remember having read that the stream between the server and the client stays open when the connection is suddenly lost and will only be destroyed once the connection times out. This means that the server sends arriving messages over the stream, even though the disconnected client can't receive those messages anymore.
I was happy with that explanation for some time, but started wondering why core XMPP would be lacking such an important feature. Eventually I noticed that ensuring correct transmission in the XMPP protocol would be redundant, as the underlying TCP should already ensure the proper transmission of the message, but as the various problems that arise from the lost message it seems that this isn't true.
Why isn't TCP enough to ensure that the message is either correctly sent or fails properly so the server knows it has to send the message later?
Why isn't TCP enough to ensure a proper transmission (or proper error handling, so the server knows the message has to be sent again) in this scenario?
Application gives the data that needs to be sent across to its TCP. TCP segments the data as needed and sends them out on established connection. Application passes over the burden of ensuring the packet reaches the other end to TCP. ( This does not mean,application should not have re-transmissions. Application level protocol can define re-send of messages if right response didn't come)
TCP has the mechanism of the Re-transmissions. Every packet sent to peer needs to be acknowledged. Until the acknowledgements come in TCP shall have the packets in its sendQ. Once the acknowledgement for the packets sent is received, they are removed.
If there is packet loss, acknowledgements don't arrive. TCP does the re-transmissions. Eventually gives up.Notifies application which needs to take action. Packet loss can happen beyond TCPs control. Thus TCP provides best-effort reliable service.

What is 'TCP out-of-order' and 'TCP port number reused' issue?

I am sending HTTP requests from IP_ADDR1 to IP_ADDR2. I observed that HTTP requests are not reaching to application level. When I take wireshark logs I noticed some issue at TCP level. What are these issue? when this occurs ? How to get rid of this? Attaching the Wireshark snapshot here.
'TCP port number reused' means that it saw a successful connection handshake, then the client sent another SYN packet with the same port numbers. If the client hadn't already acknowledged the SYN-ACK, this would have been reported as a retransmission. But since it did acknowlege the SYN-ACK, it shouldn't need to retransmit the SYN. This could mean that something on your network is duplicating packets.
'TCP out-of-order' means that the packets aren't being received in the order that their sequence numbers indicate. It might be a side effect of the duplicate packet that's causing the reused port number error -- that may be resetting the sequence numbers back to the beginning of the connection. Because otherwise it looks like the packet is in order; an HTTP command should be the next thing after the connection handshake.

Is there a way to verify whether the packets are received at a switch?

I have a router1-switch-router2 connection. My problem is if I send a packet from router1 to router2, it is not received at router2. I am sure the ipaddress/subnet address are correct. And am also sure that packets are going out the router1. And I am also sure of the internal port connections of the switch. I have access to the onpath switch. Is there any specific command that can be used in the switch to check whether the packet is received or not? ARP itself not getting resolved
You can have a packet capture app running on both the sender and receiver that would tell the incoming and outgoing packets on both boxes.
In this case probably your packet is getting dropped either on the sender or receiver side. There can be million reasons for a packet drop. But this is a good step to start with.

Failure scenarios for reliable UDP?

What could be good list of failure scenaros for testing a reliable UDP layer? I have thought of the below cases:
Drop Data packets
Drop ACK, NAK Packets
Send packets in out of sequence.
Drop intial hand shaking packets
Drop close / shutdown packets
Duplicate packets
Please help in identifying other cases that reliable UDP needs to handle?
The list you've given sounds pretty good. Also think about:
Very delayed packets (where most packets come through fine, but one or two are delayed by several minutes);
Very delayed duplicates (where the original came through quickly, but the duplicate arrived after several minutes delay);
Silent dropping of all packets above a certain size (both unidirectional and bidirectional cases);
Highly variable delays;
Sequence number wrapping tests.
Have you tried intentionally corrupting packets in transit?
Also, have you considered a scenario where only one-way communication is possible? In this case, the sending host thinks that the send failed, but the receiving end successfully processes the message. For instance:
host A sends a message to host B
B successfully receives message and replies with ACK
ACK gets dropped in the network
A waits for timeout and re-sends message (repeats steps 1-3)
host A exceeds retry count and thinks the send failed, but host B has in fact processed the message
I have thought UDP is a connectionless and unreliable protocol and that is does not require and specific transport handshake between hosts. And hence there is no such thing as a reliable UDP protocol.

What happens to a TCP packet if the server is terminated?

I know that TCP is very reliable, and what ever is sent is guaranteed to get to its destination. But what happens if after a packet is sent, but before it arrives at the server, the server goes down? Is the acknowledgment that the packet is successfully sent triggered on the server's existence when the packet is initially sent, or when the packet successfully arrives at the server?
Basically what I'm asking is - if the server goes down in between the sending and the receiving of a packet, would the client know?
It really doesn't matter, but here's some finer details:
You need to distinguish between the Server-Machine going down and the Server-Process going down.
If the Server-Machine has crashed, then, clearly, there is nothing to receive the packet. The sending client will get no retry-requests, and no acknowledgment of success or failure. After having not received any feedback at all, the client will eventually receive a timeout, and consider the connection dropped. This is pretty much identical to the cable being physically cut unexpectedly.
If, however, the Server-Machine remains functioning, but the Server-Process crashes due to a programming bug, then the receiving TCP stack, which is a function of the OS, not of the process, will likely ACK the packet, and any others that arrive. This will continue until the OS notifies the TCP stack that the process is no longer active. The TCP stack will likely send a RST (reset) notice to the client, or may drop the connection (as described above)
This is basically what happens. The full reality is hard to describe without getting tied up in unnecessary detail.
TCP manages connections which are defined as a 4-tuple (source-ip, source-port, dest-ip, dest-port).
When the server closes the connection, the connection is placed into a TIME_WAIT2 state where it cannot be re-used for a certain time. That time is double the maximum time-to-live value of the packets. Any packets that arrive during that time are discarded by TCP itself.
So, when the connection becomes available for re-use, all packets have been destroyed (anywhere on the network) either by:
being received at the destination and thrown away due to TIME_WAIT2 state; or
being destroyed by packet forwarders on the net due to expired lifetime.
When you send a packet to the network there is never a grantee it will get safely to the other side. The reliability of TCP is achieved exactly as you suggest using acknowledgment packets.

Resources