I am trying to understand the bottleneck in my setup, that is if one exists.
From an iPhone 11 app (central) I am writing to an L2CAP channel made available by a Raspberry Pi 4B (peripheral using BlueZ). (Using the CoreBluetooth framework on the iOS side)
From the btmon log I see The iPhone suggests an MTU and MPS of 1498. The Pi suggests an MTU of 65535 and MPS of 247.
Throughput gets better with higher MTU but only up to a point. No difference between specifying MTU of 5000 or 65535 from the peripheral side.
On the peripheral I have set the connection interval min to 15ms and max to 15ms (and tried 30ms). Higher intervals result in slower throughput as expected.
The kbit/s does not seem to go above the 137kbit/s, which is far lower than the 394kbit/s shown by Apple in WWDC 2017.
Data length extension is available on both devices.
From the btmon logs I see the majority packets are of size 247 (243 bytes of payload) and a few of size 146 (142 bytes of payload). This could account for some slowness but I doubt it causes the throughput to go down by a factor of 3.
Am I missing something or is this the limit for my setup?
I would like to ask a question regarding to Intel DSA (Data Streaming Accelerator).
According to the documentation, Intel DSA can support up to 256 queues and 65536 entries(the bit width of registers are 8 and 16 respectively). Are all descriptors stored locally in the sram, or is the hardware written to the DDR?
I think such a large number of descriptors and queues will take up a lot of chip area, and the hardware circuit timing and congestion is difficult to control.
I have a project in which I need to have the lowest latency possible (in the 1-100 microseconds range at best) for a communication between a computer (Windows + Linux + MacOSX) and a microcontroller (arduino or stm32 or anything).
I stress that not only it has to be fast, but with low latency (for example a fast communication to the moon will have a low latency).
For the moment the methods I have tried are serial over USB or HID packets over USB. I get results around a little less than a millisecond. My measurement method is a round trip communication, then divide by two. This is OK, but I would be much more happy to have something faster.
EDIT:
The question seems to be quite hard to answer. The best workaround I found is to synchronize clocks of the computer and microcontroler. Synchronization requires communication indeed. With the process below, dt is half a round trip, and sync is the difference between the clocks.
t = time()
write(ACK);
read(remotet)
dt = (time() - t) / 2
sync = time() - remotet - dt
Note that the imprecision of this synchronization is at most dt. The importance of the fastest communication channel stands, but I have an estimation of the precision.
Also note technicalities related to the difference of timestamp on different systems (us/ms based on epoch on Linux, ms/us since the MCU booted on Arduino).
Pay attention to the clock shift on Arduino. It is safer to synchronize often (every measure in my case).
USB Raw HID with hacked 8KHz poll rate (125us poll interval) combined with Teensy 3.2 (or above). Mouse overclockers have achieved 8KHz poll rate with low USB jitter, and Teensy 3.2 (Arduino clone) is able to do 8KHz poll rate with a slightly modified USB FTDI driver on the PC side.
Barring this, and you need even better, you're now looking at PCI-Express parallel ports, to do lower-latency signalling via digital pins directly to pins on the parallel port. They must be true parallel ports, and not through a USB layer. DOS apps on gigahertz-level PCs were tested to get sub-1us ability (1.4Ghz Pentium IV) with parallel port pin signalling, but if you write a virtual device driver, you can probably get sub-100us within Windows.
Use raised priority & critical sections out of the wazoo, preferably a non-garbage-collected language, minimum background apps, and essentially consume 100% of a CPU core on your critical loop, and you can definitely reliably achieve <100us. Not 100% of the time, but certainly in the territory of five-nines (and probably even far better than that). If you can tolerate such aberrations.
To answer the question, there are two low latency methods:
Serial or parallel port. It is possible to get latency down to the millisecond scale, although your performance may vary depending on manufacturer. One good brand is Brainboxes, although their cards can cost over $100!
Write your own driver. It should be possible to achieve latencies on the order of a few hundred micro seconds, although obviously the kernel can interrupt your process mid-way if it needs to serve something with a higher priority. This how a lot of scientific equipment actually works. (and a lot of the people telling you that a PC can't be made to work on short deadlines are wrong).
For info, I just ran some tests on a Windows 10 PC fitted with two dedicated PCIe parallel port cards.
Sending TTL (square wave) pulses out using Python code (actually using Psychopy Builder and Psychopy coder) the 2 channel osciloscope showed very consistant offsets between the two pulses of 4us to 8us.
This was when the python code was run at 'above normal' priority.
When run at normal priority it was mostly the same apart from a very occassional 30us gap, presumably when task switching took place)
In short, PCs aren't set up to handle that short of deadline. Even using a bare metal RTOS on an Intel Core series processor you end up with interrupt latency (how fast the processor can respond to interrupts) in the 2-3 µS range. (see http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/industrial-solutions-real-time-performance-white-paper.pdf)
That's ignoring any sort of communication link like USB or ethernet (or other) that requires packetizing data, handshaking, buffering to avoid data loss, etc.
USB stacks are going to have latency, regardless of how fast the link is, because of buffering to avoid data loss. Same with ethernet. Really, any modern stack driver on a full blown OS isn't going to be capable of low latency because of what else is going on in the system and the need for robustness in the protocols.
If you have deadlines that are in the single digit of microseconds (or even in the millisecond range), you really need to do your real time processing on a microcontroller and have the slower control loop/visualization be handled by the host.
You have no guarantees about latency to userland without real time operating system. You're at the mercy of the kernel and it's slice time and preemption rules. Which could be higher than your maximum 100us.
In order for a workstation to respond to a hardware event you have to use interrupts and a device driver.
Your options are limited to interfaces that offer an IRQ:
Hardware serial/parallel port.
PCI
Some interface bridge on PCI.
Or. If you're into abusing IO, the soundcard.
USB is not one of them, it has a 1kHz polling rate.
Maybe Thunderbolt does, but I'm not sure about that.
Ethernet
Look for a board that has a gigabit ethernet port directly connected to the microcontroller, and connect it to the PC directly with a crossover cable.
Is there a possibility to exclusively reserve the GPU for an OpenCL host programme?
No other process shall have access to this device via OpenCL or OpenGL.
Background is that my software calculates real time data on the GPU and therefore it's not good for the performance if the GPU is doing other stuff as well.
I have couple of question related to mobile network generations:
While comparing generation of mobile network people mention that 1G and 2G has to use maximum bandwidth. What is bandwidth in this content and why they use maximum bandwidth?
1G and 2G are narrow band networks. What is narrow band here?
3G and 4G are wide band networks. Don't they use maximum bandwidth?
They use maximum bandwidth to transmit analogue voice information. Since available bandwidth is limited and there is a limit on the bandwidth needed to make the voice recognizable on the other side, they can't go below this fixed bandwidth.
It refers to the frequency span of the carrier. If it is narrow band, then usually it's just a few kilohertzes, which makes available bandwidth low (this also depends on the kind of modulation used).
They don't need the maximum available bandwidth for voice communications, since for an acceptable voice quality we need to transmit signals that encode just a few kilohertzes of analogue information.