Reduce occurrence of Instant Passed (0x28) BLE disconnection error - bluetooth-lowenergy

I am developping an application on the STM32 SPBTLE-1S module (BLE 4.2). The module connects to a Raspberry Pi.
When the connection quality is low, a disconnection will sometimes occur with error code 0x28 (Reason: Instant Passed) before the connection timeout is reached.
Current connection settings are:
Conn_Interval_Min: 10
Conn_Interval_Max: 20
Slave_latency: 5
Timeout_Multiplier: 3200
Reading more on this type of error, it seems to happen when "an LMP PDU or LL PDU that includes an instant cannot be performed because the instant when this would have occurred has passed." These paquets are typically used for frequency hopping or for connection updates. In my case, they must be frequency hoping paquets.
Any idea on how to prevent these disconnections caused by "Instant Passed" errors? Or are they simply a consequence of the BLE technology?

Your question sounds similar to this one
In a nutshell, there's only two possible link layer requests that can result in this type of disconnect (referred to as LL_CONNECTION_UPDATE_IND & LL_CHANNEL_MAP_IND in the latest v5.2 Bluetooth Core Spec)
If you have access to the low level firmware for the bluetooth stack on the embedded device, what I've done in the past is increase the number of slots in the future the switch "Instant" is at so there's more time for the packet to go through in a noisy environment.
Otherwise the best you can do is try to limit the amount of times you change connection parameters to make that style of disconnect less likely to occur. (The disconnect could still be triggered by channel map change but I haven't seen many BLE stacks expose a lot of configuration over when those take place.)

Related

Need advice on serial port software access implementation on single CPU core device

I am designing the software controlling several serial ports, operating system is OpenWrt. The device application is running in is single core ARM9 # 450 MHz. Protocol for serial ports is Modbus.
The problem is with Modbus slave implementation. I designed it in real-time manner, looping reading data from serial port (port is open in non-blocking mode with 0 characters to wait for and no timeouts). The sequence/data stream timeout is about 4 milliseconds # 9600/8/N/1 (3.5 characters as advised). Timeout is checked if application does not see anything in the buffer, therefore if application is slower that incoming stream of characters, timeouts mechanism will not take place - until all characters are removed from the buffer and bus is quiet.
But I see that CPU switches between threads and this thread is missed for about 40-60 milliseconds, which is a lot to measure timeouts. While I guess serial port buffer still receives the data (how long this buffer is?), I am unable to assess how much time passed between chars, and may treat next message as continuation of previous and miss Modbus request targeted for my device.
Therefore, I guess, something must be redesigned (for the slave - master is different story). First idea coming to my mind is to forget about timeouts, and just parse the incoming data after being synchronized with the whole stream (by finding initial timeout). However, there're several problems - I must know everything about all types of Modbus messages to parse them correctly and find out their ends and where next message starts, and I do not see the way how to differentiate Modbus request with Modbus response from the device. If developers of the Modbus protocol would put special bit in command field identifying if message is request or response... but it is not the case, and I do not see the right way to identify if message I am getting is request or response without getting following bytes and checking CRC16 at would-be byte counts, it will cost time while I am getting bytes, and may miss window for the response to request targeted for me.
Another idea would be using blocking method with VTIME timeout setting, but this value may be set to tenths of seconds only (therefore minimal is 100 ms), and this is too much given another +50 ms for possible CPU switching between threads, I think something like timeout of 10 ms is needed here. It is a good question if VTIME is hardware time, or software/driver also subject to CPU thread interruptions; and the size of FIFO in the chip/driver how many bytes it can accumulate.
Any help/idea/advice will be greatly appreciated.
Update per #sawdust comments:
non-blocking use has really appeared not a good way because I do not poll hardware, I poll software, and the whole design is again subject to CPU switching and execution scheduling;
"using the built-in event-based blocking mode" would work in blocking mode if I would be able to configure UART timeout (VTIME) in 100 us periods, plus if again I would be sure I am not interrupted during the executing - only then I will be able to have timing properly, and restart reading in time to get next character's timing assessed properly.

mqtt paho network loop unnecessary?

I have seen a number of examples of paho clients reading sensor data then publishing, e.g., https://github.com/jamesmoulding/motion-sensor/blob/master/open.py. None that I have seen have started a network loop as suggested in https://eclipse.org/paho/clients/python/docs/#network-loop. I am wondering if the network loop is unnecessary for publishing? Perhaps only needed if I am subscribed to something?
To expand on what #hardillb has said a bit, his point 2 "To send the ping packets needed to keep a connection alive" is only strictly necessary if you aren't publishing at a rate sufficient to match the keepalive you set when connecting. In other words, it's entirely possible the client will never need to send a PINGREQ and hence never need to receive a PINGRESP.
However, the more important point is that it is impossible to guarantee that calling publish() will actually complete sending the message without using the network loop. It may work some of the time, but could fail to complete sending a message at any time.
The next version of the client will allow you to do this:
m = mqttc.publish("class", "bar", qos=2)
m.wait_for_publish()
But this will require that the network loop is being processed in a separate thread, as with loop_start().
The network loop is needed for a number of things:
To deal with incoming messages
To send the ping packets needed to keep a connection alive
To handle the extra packets needed for high QOS
Send messages that take up more than one network packet (e.g. bigger than local MTU)
The ping messages are only needed if you have a low message rate (less than 1 msg per keep alive period).
Given you can start the network loop in the background on a separate thread these days, I would recommend starting it regardless

Signalr LongPollDelay and the buffer

We have Safari mobile clients that are affected by one of their 5 connections being blocked by signalr. We have used the solution propped here: https://github.com/SignalR/SignalR/issues/1406#issuecomment-14284093
Where we have these settings changed to the following for signalR 2.x
GlobalHost.Configuration.ConnectionTimeout =
TimeSpan.FromMilliseconds(1000);
GlobalHost.Configuration.LongPollDelay = TimeSpan.FromMilliseconds(5000);
We are sending notifications from the server to the client with no message queue or acknowledgement framework. We don’t need to guarantee message delivery but we do want there to be a high probability of success. We think this should be possible due to our low message rate and a buffer size of 1000. However we have some questions:
Are messages held in a queue while the LongPollDelay occurs? Should
they be sent during the next long poll using the settings above?
Our tests with a single message being sent during a 2 minute
LongPollDelay suggest that they are not retrieved during the 1
second long poll request that follows. Are there any reasons for
this i.e. buffer flushing after 1 minute?
Does ConnectionTimeout affect all transports?
If ConnectionTimeout applies to all transports is there a way of
setting this for only Safari mobile users i.e. have two connections
available and use agent detection to point to a specific connection?
Is there a way of setting the LongPollDelay so that this also only
applied to only Safari mobile users?
All advice welcome and appreciated, Matt
[FOLLOW-UP QUESTIONS]
Thanks that helps a lot. We have retried with 30secs LongPollDelay and it works as expected. I have a couple of follow-up questions that you/someone might care to comment on:
1) During testing we also see the client sending a ping request to the server roughly every 5 minutes. Why is the ping period set to 5 minutes when the disconnect period is so much shorter, and what is the purpose of the client pinging the server if it assumes it is disconnected via an alternative mechanism.
2) w.r.t. Different configurations for different clients. Could we not set up another SignalR endpoint and point only Safari mobile to this? Something like the response to this post:
Can I reduce the Circular Buffer to "1"? Is that a good idea?
You are correct that the SignalR will queue/buffer messages. Even if there wasn't a LongPollDelay configured, SignalR needs to do this because there is always a chance that messages are sent while clients are repolling/reconnecting.
SignalR assumes that the client has disconnected if the client hasn't been connected to the server within the last DisconnectTimeout. Once the DisconnectTimeout triggers, SignalR will call OnDisconnected and clear any message buffers belonging to the supposedly disconnected client so it doesn't leak memory. The DisconnectTimeout defaults to 30 seconds which is far less than the 2 minute LongPollDelay you configured, so that explains this behavior.
The ConnectionTimeout only affects long polling unless you've disabled keep alives. If keep alives are disabled, it applies to all transports.
There is no way to selectively configure the ConnectionTimeout for specific types of clients. But as I stated, it only affects long polling by default.
There is no way to selective configure the LongPollDelay for specific types of clients.

MODBUS, is there a maximum time a device can take to respond?

In talking to a MODBUS device, is there an upper bound on how long a device can take to respond before it's considered a timeout? I'm trying to work out what to set my read timeout to. Answers for both MODBUS RTU and TCP would be great.
In MODBUS over serial line specification and implementation guide V1.0 section 2.5.2.1 MODBUS Message ASCII Framing There are suggestions that inter-character delays of up to 5 seconds are reasonable in slow WAN configurations.
2.6 Error Checking Methods indicates that the timeouts are configured without specifying any values.
The current Modicon Modbus Protocol Reference Guide PI–MBUS–300 Rev. J also provides no quantitative suggestions for these settings.
Your application time-sensitivity, along with the constraints that your network enforces, will largely determine your choices.
If you identify the worst-case delays you can tolerate, take half that time to allow a single retransmission to fail, subtract reasonable transmission times for a message of maximal length, then you should have a good candidate for a timeout. This will allow you to recover from a single error, while not reporting errors unnecessarily often.
Of course, the real problem is, what to do when the error occurs. Is it likely to be a transient problem, or is it the result of a permanent fault that requires attention?
Alexandre Vinçon's comment about the ACKNOWLEDGEMENTs is also relevant. It may be your device does not implement this, and extended delays may be intended.
The specification does not mention a particular value for the timeout, because it is not possible to normalize a timeout value for a wide range of MODBUS slaves.
However, it is a good assumption that you should receive a reply within a few hundreds of milliseconds.
I usually define my timeouts to 1 second with RTU and 500 ms with TCP.
Also, if the device takes a long time to reply, it is supposed to return an ACKNOWLEDGE message to prevent the expiration of the timeout.

WCF: System.Net.SocketException - Only one usage of each socket address (protocol/network address/port) is normally permitted

I have a WCF service and a Web application. Web application makes calls to this WCF service in a continous manner a.k.a polling. In our production environment, I receive this error very rarely. Since, this is an internal activity users were not aware of when this error is thrown.
Could not connect to
http://localhost/QAService/Service.svc.
TCP error code 10048: Only one usage
of each socket address
(protocol/network address/port) is
normally permitted 127.0.0.1:80. --->
System.Net.WebException: Unable to
connect to the remote server --->
System.Net.Sockets.SocketException:
Only one usage of each socket address
(protocol/network address/port) is
normally permitted 127.0.0.1:80
I am having trouble in reproducing this behaviour in our dev/qa environment. I have made sure that the client connection is closed in a try..catch..finally block. Still don't understand what is causing this issue .. any one aware of this?
Note: I've looked at this SO question, but not seems to be answering my problem, so it is not repeated questions.
You are overloading the TCP/IP stack. Windows (and I think all socket stacks actually) have a limitation on the number of sockets that can be opened in rapid sequence due to how sockets get closed under normal operation. Whenever a socket is closed, it enters the TIME_WAIT state for a certain time (240 seconds IIRC). Each time you poll, a socket is consumed out of the default dynamic range (I think its about 5000 dynamic ports just above 1024), and each time that poll ends, that particular socket goes into TIME_WAIT. If you poll frequently enough, you will eventually consume all of the available ports, which will result in TCP error 10048.
Generally, WCF tries to avoid this problem by pooling connections and things like that. This is usually the case with internal services that are not going over the internet. I am not sure if any of the wsHttp bindings support connection pooling, but the netTcp binding should. I would assume named pipes does not run into this problem. I couldn't say for the MSMQ binding.
There are two solutions you can use to get around this problem. You can either increase the dynamic port range, or reduce the period of TIME_WAIT. The former is probably the safer route, but if you are consuming an extremely high volume of sockets (which doesn't sound like the case for your scenario), reducing TIME_WAIT is a better option (or both together.)
Changing the Dynamic Port Range
Open regedit.
Open key HKLM\System\CurrentControlSet\Services\Tcpip\Parameters
Edit (or create as DWORD) the MaxUserPort value.
Set it to a higher number. (i.e. 65534)
Changing the TIME_WAIT delay
Open regedit.
Open key HKLM\System\CurrentControlSet\Services\Tcpip\Parameters
Edit (or create as DWORD) the TcpTimedWaitDelay.
Set it to a lower number. Value is in seconds. (i.e. 60 for 1 minute delay)
One of the above solutions should fix your problem. If it persists after changing the port range, I would see try increasing the period of your polling so it happens less frequently...that will give you more leeway to work around the time wait delay. I would change the time wait delay as a last resort.
HttpClient, although it implements IDisposable is a shared object, you should reduce the number of instances as much as possible. You can get away with only having one instance for the entire lifetime of your application rather than one for each request.
I wrote about it pretty extensively at http://aspnetmonsters.com/2016/08/2016-08-27-httpclientwrong/

Resources