Why do XMPP messages sometimes get lost on mobile devices

Why do XMPP messages sometimes get lost on mobile devices - tcp

This question asks what to do about loosing XMPP messages on mobile devices when they don't have a stable connection, but I don't really see why the packages get lost in the first place.
I remember having read that the stream between the server and the client stays open when the connection is suddenly lost and will only be destroyed once the connection times out. This means that the server sends arriving messages over the stream, even though the disconnected client can't receive those messages anymore.
I was happy with that explanation for some time, but started wondering why core XMPP would be lacking such an important feature. Eventually I noticed that ensuring correct transmission in the XMPP protocol would be redundant, as the underlying TCP should already ensure the proper transmission of the message, but as the various problems that arise from the lost message it seems that this isn't true.
Why isn't TCP enough to ensure that the message is either correctly sent or fails properly so the server knows it has to send the message later?

Why isn't TCP enough to ensure a proper transmission (or proper error handling, so the server knows the message has to be sent again) in this scenario?
Application gives the data that needs to be sent across to its TCP. TCP segments the data as needed and sends them out on established connection. Application passes over the burden of ensuring the packet reaches the other end to TCP. ( This does not mean,application should not have re-transmissions. Application level protocol can define re-send of messages if right response didn't come)
TCP has the mechanism of the Re-transmissions. Every packet sent to peer needs to be acknowledged. Until the acknowledgements come in TCP shall have the packets in its sendQ. Once the acknowledgement for the packets sent is received, they are removed.
If there is packet loss, acknowledgements don't arrive. TCP does the re-transmissions. Eventually gives up.Notifies application which needs to take action. Packet loss can happen beyond TCPs control. Thus TCP provides best-effort reliable service.

Related

BLE indications

As I understand, BLE indications are a reliable communications method. How do you know if your indications was not communicated. I am writing code for the peripheral/server and currently when I send a notifications, I get a manual response from the central. I read that if I use indications, the acknowledges take place in the L2CAP layer automatically and communications is therefore faster, but how does my embedded controller know the Bluetooth module was not successful at getting the packet across the link? We are using the Microchip RN4030 Bluetooth module.

Let's make things clear.
The BLE stack looks roughly like the following. The stack has these layers in this order:
Link Layer
HCI (if controller and host are separated)
L2CAP
ATT
GATT
Application
The Link Layer is a reliable protocol in the sense that all packets are protected by a CRC and every packet is acknowledged by the receiving device. If a packet is not acknowledged, it is resent until an acknowledge is received. There can also only be one outstanding packet, which means no reordering of packets are possible. Acknowledges are normally not being presented to upper layers.
The HCI layer is the communication protocol between the controller and the host.
The L2CAP layer does almost nothing if you use the standard MTU size of 23. It has a length header and a type code ("channel") which indicates what type of packet is being sent (usually ATT).
On the ATT level, there are two types of packets that are sent from the server that are not sent as a response to a client request. Those are notifications and indications. Sending one notification or indication has the same "performance" since the type is just a tag of a packet that's sent over the lower layers. The differences are listed below:
Indications:
When an indication packet is sent to the client, the client must acknowledge the packet by sending a confirmation packet when it has received the indication packet. Even if the indication packet is invalid, a confirmation shall be sent back.
Upper layers are not involved sending back the confirmation.
The server may not send a new indication until it has received a confirmation from the previous one.
The only time you won't receive a confirmation after an indication is if the link is dropped (you should then get a disconnected event), or there is some bug in some of the BLE stacks.
After the app has sent an indication, most BLE stack confirms to the app that that a confirmation has been received by the client as that the indication operation has completed.
Notifications:
No ATT layer acknowledges are sent.
"Commands and notifications that are received but cannot be processed, due to buffer overflows or other reasons, shall be discarded. Therefore, those PDUs must be considered to be unreliable." However I have never noticed an implementation actually following this rule, i.e. all notifications are delivered to the application. Or I've never hit the max buffer size.
The GATT layer is mostly a definition of how the attribute protocol should be used and defines a DB structure of characteristics.
Implications
According to my opinion, there are several flaws or issues with the BLE standard. There are two things that might make indications useless and unreliable in practice:
There are no way to send back an error response instead of a confirmation.
The fact that it is the ATT layer that sends back the confirmation directly when it has received the indication, and not the app when it has successfully handled the indication.
This means that if for example, some bug or other issue causing that the BLE stack couldn't send the indication to the app, or your app crashed, or your app found your value to be invalid, there is no way your peripheral can aware of that. Since it got the confirmation it thinks everything is fine.
I can't understand why they defined indications this way. Since the app doesn't send the confirmation but a lower layer does, there seems to be no point at all in having an ATT layer acknowledge instead of just using the Link Layer acknowledge. Both are just acknowledges that the packet has been received halfway of its destination.
If we draw a parallel to a HTTP POST and internet, we could consider the Link Layer acknowledge as when the network card of the destination receives the request and the ATT confirmation as a confirmation that the TCP stack received the packet. You have no way of knowing that your web server software actually did receive your request, and it processed it with success.
The fact that notifications are allowed to be dropped by the receiver is also bad. Normally notifications are used if the peripheral wants to stream a lot of data to the central and then you don't want dropped packets. They should have designed the flow control so that the Link Layer stopped acknowledge incoming packets instead until the app are ready to process the next notifications. This is even already implemented at the LL + HCI + Host layers.
Windows
One interesting thing about at least the Windows BLE stack is, if it receives indications faster than the app processes them it starts to drop the indications as well, even though only notifications should be allowed to be dropped due to "buffer overflows or other reasons". It buffers at most 512 indications in my tests.
That said
Just use notifications and if you want some kind of confirmation, let the client send a write packet when it has received your data and successed processing it.

How to Reduce TCP delays caused by ARP flushes for MODBUS TCP

We have an application which is periodically sending TCP messages at a defined rate(Using MODBUS TCP). If a message is not received within a set period an alarm is raised. However every once in a while there appears to be a delay in messages being received. Investigation has shown that this is associated with the ARP cache being refreshed causing a resend of the TCP message.
The IP stack provider have told us that this is the expected behaviour for TCP. The questions are,
Is this expected behaviour for an IP stack? If not how do other stacks work around the period when IP/MAC address translation is not available
If this is the expected behaviour how can we reduce the delay in TCP messages during this period?(Permanent ARP entries have been tried, but are not the best solution)

In my last job I worked with a company building routers and switches. Our implementation would queue packets waiting for ARP replies and send them when the ARP reply was received. Therefore, no TCP retransmit required.
Retransmission in TCP occurs when an ACK is not received within a given time. If the ARP reply takes a long time, or is itself lost, you might be getting a retransmission even though the device waiting for the ARP reply is queuing the packet.
It would appear from your question that the period of the TCP message is shorter than the ARP refresh time. This implies that reuse of the ARP is not causing it to stay refreshed, which is possible behaviour that would be helpful in your situation.
A packet trace of the situation occurring could be helpful - are you actually losing the first packet? How long does the ARP reply take?
In order to stop the ARP cache timing out, you might want to try to find something that will refresh it, such as another ARP request for the same address, or a gratuitous ARP.
I found a specification for MODBUS TCP but it didn't help. Can you post some details of your network - media, devices, speeds?

Your description suggests that the peer ARP entries expire between TCP segments and cause some subsequent segments to fail due to the lack of a current MAC destination.
If you have the MODBUS devices on a separate subnet, then perhaps the destination router will be kind enough to queue the segment until it receives a valid MAC. If you cannot use a separate subnet, you could try to force the session to have keep-alives activated - this would cause a periodic empty message to be sent that would keep the ARP timers resetting. If the overhead of the keep-alive is too high and you completely control the application in your system, you could try to force zero-length messages through to the peer.

When does a Java socket send an ack?

My question is that when a socket at the receiver-side sends an ack? At the time the application read the socket data or when the underlying layers get the data and put it in the buffer?
I want this because I want both side applications know whether the other side took the packet or not.

It's up to the operating system TCP stack when this happens, since TCP provides a stream to the application there's no guarenteed 1:1 correlation between the application doing read/writes and the packets sent on the wire and the TCP acks.
If you need to be assured the other side have received/processed your data, you need to build that into your application protocol - e.g. send a reply stating the data was received.

TCP ACKs are meant to acknowledge the TCP packets on the transmission layer not the application layer. Only your application can signal explicitly that it also has processed the data from the buffers.

TCP/IP (and therefor java sockets) will guarantee that you either successfully send the data OR get an error (exception in the case of java) eventually.

How does TCP/IP report errors?

How does TCP/IP report errors when packet delivery fails permanently? All Socket.write() APIs I've seen simply pass bytes to the underlying TCP/IP output buffer and transfer the data asynchronously. How then is TCP/IP supposed to notify the developer if packet delivery fails permanently (i.e. the destination host is no longer reachable)?
Any protocol that requires the sender to wait for confirmation from the remote end will get an error message. But what happens for protocols where a sender doesn't have to read any bytes from the destination? Does TCP/IP just fail silently? Perhaps Socket.close() will return an error? Does the TCP/IP specification say anything about this?

TCP/IP is a reliable byte stream protocol. All your bytes will get to the receiver or you'll get an error indication.
The error indication will come in the form of a closed socket. Regardless of what the communication pattern (who does the sending), if the bytes can't be delivered, the socket will close.
So the question is, how do you see the socket close? If you're never reading, you'd eventually get an error trying to write to the closed socket (with ECONNRESET errno, I think).
If you have a need to sleep or wait for input on another file handle, you might want to do your waiting in a select() call where you include the socket in the list of sources you're waiting on (even if you never expect to receive anything). If the select() indicates that the socket is ready for a read call, you may get a -1 return (with ECONNRESET, I think). An EOF would indicate an orderly close (other side did a shutdown() or close().
How to distinguish this error close from a clean close (other program exiting, for example)? The errno values may be enough to distinguish error from orderly close.
If you want an unambiguous indication of a problem, you'll probably need to build some sort of application level protocol above the socket layer. For example, a short "ack" message sent by the receiver back to the sender. Then the violation of that higher level application protocol (sender didn't see an ack) would be a confirmation that it was an error close vs a clean close.

The sockets API has no way of informing the writer exactly how many bytes have been received as acknowledged by the peer. There are no guarantees made by the presence of a successful shutdown or close either.
The TCP/IP specification says nothing about the application interface (which is nearly always the sockets API).
SCTP is an alternative to TCP which attempts to address these shortcomings, among others.

In C, if you write to a socket that has failed with send(), you will get back the number of bytes that were sent. If this does not match the number of bytes you meant to send, then you have a problem. But also, when you write to a failed socket, you get SIGPIPE back. Before you start socket handling, you need to have a signal handler in place that will alert you when you get SIGPIPE.
If you are reading from a socket, you really should wrap it with an alarm so you can timeout. Like "alarm(timeout_val); recv(); alarm(0)". Check the return code of recv, and if it's 0, that indicates that the connection has been closed. A negative return result indicates a read failure and you need to check errno.

TCP is built upon the IP protocol, which is the centerpiece for the Internet, providing much of the interoperability that drives Routing, which is what determines how to get packets from their source to their destination. The IP protocol specifies that error messages should be sent back to the sender via Internet Control Message Protocol(ICMP) in the case of a packet failing to get to the sender. Some of these reasons include the Time To Live(TTL) field being decremented to zero, often meaning that the packet got stuck in a routing loop, or the packet getting dropped due to switch contention causing buffer overruns. As others have said, it is the responsibility of the Socket API that is being used to relay these errors at the IP layer up to the application interacting with the network at the TCP layer.

TCP/IP packets are either raw, UDP, or TCP. TCP requires each byte to be acked, and it will re-transmit bytes that are not acked in time. raw, and UDP are connectionless (aka best effort), so any lost packets (barring some ICMP cases, but many of these get filtered for security) are silently dropped. Upper layer protocols can add reliability, such as is done with some raw OSPF packets.

What happens to a TCP packet if the server is terminated?

I know that TCP is very reliable, and what ever is sent is guaranteed to get to its destination. But what happens if after a packet is sent, but before it arrives at the server, the server goes down? Is the acknowledgment that the packet is successfully sent triggered on the server's existence when the packet is initially sent, or when the packet successfully arrives at the server?
Basically what I'm asking is - if the server goes down in between the sending and the receiving of a packet, would the client know?

It really doesn't matter, but here's some finer details:
You need to distinguish between the Server-Machine going down and the Server-Process going down.
If the Server-Machine has crashed, then, clearly, there is nothing to receive the packet. The sending client will get no retry-requests, and no acknowledgment of success or failure. After having not received any feedback at all, the client will eventually receive a timeout, and consider the connection dropped. This is pretty much identical to the cable being physically cut unexpectedly.
If, however, the Server-Machine remains functioning, but the Server-Process crashes due to a programming bug, then the receiving TCP stack, which is a function of the OS, not of the process, will likely ACK the packet, and any others that arrive. This will continue until the OS notifies the TCP stack that the process is no longer active. The TCP stack will likely send a RST (reset) notice to the client, or may drop the connection (as described above)

This is basically what happens. The full reality is hard to describe without getting tied up in unnecessary detail.
TCP manages connections which are defined as a 4-tuple (source-ip, source-port, dest-ip, dest-port).
When the server closes the connection, the connection is placed into a TIME_WAIT2 state where it cannot be re-used for a certain time. That time is double the maximum time-to-live value of the packets. Any packets that arrive during that time are discarded by TCP itself.
So, when the connection becomes available for re-use, all packets have been destroyed (anywhere on the network) either by:
being received at the destination and thrown away due to TIME_WAIT2 state; or
being destroyed by packet forwarders on the net due to expired lifetime.

When you send a packet to the network there is never a grantee it will get safely to the other side. The reliability of TCP is achieved exactly as you suggest using acknowledgment packets.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex