What factors maximise BLE throughput using an L2CAP channel in iOS? - bluetooth-lowenergy

I am trying to understand the bottleneck in my setup, that is if one exists.
From an iPhone 11 app (central) I am writing to an L2CAP channel made available by a Raspberry Pi 4B (peripheral using BlueZ). (Using the CoreBluetooth framework on the iOS side)
From the btmon log I see The iPhone suggests an MTU and MPS of 1498. The Pi suggests an MTU of 65535 and MPS of 247.
Throughput gets better with higher MTU but only up to a point. No difference between specifying MTU of 5000 or 65535 from the peripheral side.
On the peripheral I have set the connection interval min to 15ms and max to 15ms (and tried 30ms). Higher intervals result in slower throughput as expected.
The kbit/s does not seem to go above the 137kbit/s, which is far lower than the 394kbit/s shown by Apple in WWDC 2017.
Data length extension is available on both devices.
From the btmon logs I see the majority packets are of size 247 (243 bytes of payload) and a few of size 146 (142 bytes of payload). This could account for some slowness but I doubt it causes the throughput to go down by a factor of 3.
Am I missing something or is this the limit for my setup?

Related

Movesense 1.6.2 send_ble_nus_data B/s?

What is B/s which can be achieved with Movesense send_ble_nus with 1.6.2? I assume packet length 20 is optimal. With 50 Hz * 20 B/s = 1000 B/s no loss when listening with Xamarin Forms https://github.com/aritchie/bluetoothle component on Windows 10 and Android 8.1. With 100 Hz * 20 B/s = 2000 B/s some (Window 10 <1 %, Android 8.1 <0.1 %) packets lost. Can 2000 B/s rates be obtained with e.g. MTU changes or with more optimal code?
Movesense sensor supports up to 158 byte MTU and BLE 4.2 Data Length Extension. If the counterpart knows to use large MTU and DLE, the optimal is to fill it all: i.e. putting data in 155 byte packets. Theoretically it is possible to get up to 800kbps speeds, but in practice with mobile it will be less (maybe much less).
With android it is easy to see what connection parameters are negotiated by enabling the "HCI dump" feature from Developer settings and studying the resulting .log file with Wireshark protocol analyzer.

What happens when ethernet reception buffer is full

I have a quite newbie question : assume that I have two devices communication via Ethernet (TCP/IP) at 100Mbps. In one side, I will be feeding the device with data to transmit. At the other side, I will be consuming the received data. I have the ability to choose the adequate buffer size of both devices.
And now my question is : If data consumption rate from the second device, is slower than data feeding rate at the first one, what will happen then?
I found some, talking about overrun counter.
Is there anything in the ethernet communication indicating that a device is momently busy and can't receive new packets? so I can pause the transmission from the receiver device.
Can some one provide me with a document or documents that explain this issue in detail because I didn't find any.
Thank you by advance
Ethernet protocol runs on MAC controller chip. MAC has two separate RX-ring (for ingress packets) and TX-ring(for egress packets), this means its a full-duplex in nature. RX/TX-rings also have on-chip FIFO but the rings hold PDUs in host memory buffers. I have covered little bit of functionality in one of the related post
Now, congestion can happen but again RX and TX are two different paths and will be due to following conditions
Queue/de-queue of rx-buffers/tx-buffers is NOT fast compared to line rate. This happens when CPU is busy and not honer the interrupts fast enough.
Host memory is slower (ex: DRAM and not SRAM), or not enough memory(due to memory leak)
Intermediate processing of the buffers taking too long.
Now, about the peer device: Back-pressure can be taken care in the a standalone system and when that happens, we usually tail drop the packets. This is agnostics to the peer device, if peer device is slow its that device's problem.
Definition of overrun is: Number of times the receiver hardware was unable to handle received data to a hardware buffer because the input rate exceeded the receiver’s ability to handle the data.
I recommend pick any MAC controller's data-sheet (ex: Intel's ethernet Controller) and you will get all your questions covered. Or if you get to see device-driver for any MAC controller.
TCP/IP is upper layer stack sits inside kernel(this can be in user plane as well), whereas ARPA protocol (ethernet) is inside MAC controller hardware. If you understand this you will understand the difference between router and switches (where there is no TCP/IP stack).

How many Bluetooth devices can simultaneously be connected to/scanned by a Bluetooth low energy masterl ?

Is there a maximum in specifications,do they start to interfer if many try to connect at the same time?
what are the modes of communication is there a secured mode or something else ?
Maximum packet size ?
can I send an image or a sound using ble ?
There is no limit in the specification. In reality at least around a millisecond must be allocated to serve each connection event. So if you use a 7.5 ms connection interval you could not expect more than at maximum 10 connections without getting dropped packets (and therefore larger latency). Connection setup/scanning will also miss a large amount of packets if the radio is busy handling current connections.
The maximum packet length is 31 bytes for advertisements (up to Bluetooth 4.2). While connected the longest packet length is 27 bytes. Bluetooth 4.2 defines a packet length extension allowing larger packets but far from all implementations support that.
The security that BLE offers is the bonding procedure. After bonding the devices have established a shared secret key which is then used to encrypt and sign all data being sent.
Sending normal-sized images or sounds will take several seconds since the throughput is quite low.
I think you should really read the Bluetooth specification or some summary to get the answer to your questions.

Slow detection of radius Network Dot Beacon

I'm trying to use a WiPy board as a BLE scanner and we're detecting some extrange behaviours on radius Network Dot beacons.
We are trying with several beacon manufacturers, all emiting as iBeacon with an advertising interval of 100ms (10 per second). In the board we detected every second at least 4-5 advertising packets of this beacons, but most of the times 0 or 1 of the radius beacon dot.
We've tried with both altBeacon or iBeacon configuration and the results are similar.
This is a screenshot of the configuration with the RadBeacon app:
Are we configuring something wrong or are the beacons having an unexpected behaviour?
Not 100% of advertising packets send out by a Bluetooth LE device will be detected by receiving devices. The actual percentage received depends on a number of factors including:
Transmitter power level
Distance between the two devices
Radio noise in the area
Bluetooth radio congestion
Antennas on both transmitter and receiver
Orientation of antennas
Under good conditions (close range, high transmitter power), I typically see 80-90% of packets get detected by Android and iOS devices which allow you to easily count individual BLE packets.
Since your detection rate is much lower, you may want to try a number of things:
Increase your transmitter power level from -18 dBm to 3 dBm. Having the weakest power output configured is the most likely cause of your issue.
Bring your transmitter and receiver closer together.
If the above two suggestions don't help, I would use an independent tool such as an Android phone to actually count the number of BLE packets detected. I have an bare-bones app you can run on Android to do this here, but you'll need a copy of Android Studio to build and run it.

PCIe Bandwidth on ATI FirePro

I am trying to measure PCIe Bandwidth on ATI FirePro 8750. The amd app sample PCIeBandwidth in the SDK measures the bandwith of transfers from:
Host to device, using clEnqueueReadBuffer().
Device to host, using clEnqueueWriteBuffer().
On my system (windows 7, Intel Core2Duo 32 bit) the output is coming like this:
Selected Platform Vendor : Advanced Micro Devices, Inc.
Device 0 : ATI RV770
Host to device : 0.412435 GB/s
Device to host : 0.792844 GB/s
This particular card has 2 GB DRAM and max clock frequency is 750 Mhz
1- Why is bandwidth different in each direction?
2- Why is the Bandwdith so small?
Also I understand that this communication takes place through DMA, so the Bandwidth may not be affected by CPU.
This paper from Microsoft Research labs give some inkling of why there is asymmetric PCIe data transfer bandwidth between GPU - CPU. The paper describes performance metrics for FPGA - GPU data transfer bandwidth over PCIe. It also includes metrics from CPU - GPU data transfer bandwidth over PCIe.
To quote the relevant section
'it should also be noted that the GPU-CPU transfers themselves also
show some degree of asymmetric behavior. In the case of a GPU to CPU
transfer, where the GPU is initiating bus master writes, the GPU
reaches a maximum of
6.18 GByte/Sec. In the opposite direction from CPU to GPU, the GPU is initiating bus master reads and the resulting bandwidth falls to 5.61
GByte/Sec. In our observations it is typically the case that bus
master writes are more efficient than bus master reads for any PCIe
implementation due to protocol overhead and the relative complexity of
implementation. While a possible solution to this asymmetry would be
to handle the CPU to GPU direction by using CPU initiated bus master
writes, that hardware facility is not available in the PC architecture
in general. '
The answer to the second question on bandwidth could be due units of data transfer size.
See figs 2,3,4 and 5. I have also seen graphs like this at the 1st AMD Fusion Conference. The explanation is that the PCIe transfer of data has overheads due to the protocol and the device latency. The overheads are more significant for small transfer sizes and become less significant for larger sizes.
What levers do you have to control or improve performance?
Getting the right combo of chip/motherboard and GPU is the H/W lever. Chips with the max number of PCIe lanes are better. Using a higher spec PCIe protocol, PCIe 3.0 is better than PCIe 2.0. All components need to support the higher standards.
As a programmer controlling the data transfer size, is a very important lever.
Transfer sizes of 128K - 256K bytes get approx 50% of the max bandwidth. Transfers of 1M - 2M bytes get over 90% of max bandwidth.

Resources