What is the lowest latency communication method between a computer and a microcontroller? - arduino

I have a project in which I need to have the lowest latency possible (in the 1-100 microseconds range at best) for a communication between a computer (Windows + Linux + MacOSX) and a microcontroller (arduino or stm32 or anything).
I stress that not only it has to be fast, but with low latency (for example a fast communication to the moon will have a low latency).
For the moment the methods I have tried are serial over USB or HID packets over USB. I get results around a little less than a millisecond. My measurement method is a round trip communication, then divide by two. This is OK, but I would be much more happy to have something faster.
EDIT:
The question seems to be quite hard to answer. The best workaround I found is to synchronize clocks of the computer and microcontroler. Synchronization requires communication indeed. With the process below, dt is half a round trip, and sync is the difference between the clocks.
t = time()
write(ACK);
read(remotet)
dt = (time() - t) / 2
sync = time() - remotet - dt
Note that the imprecision of this synchronization is at most dt. The importance of the fastest communication channel stands, but I have an estimation of the precision.
Also note technicalities related to the difference of timestamp on different systems (us/ms based on epoch on Linux, ms/us since the MCU booted on Arduino).
Pay attention to the clock shift on Arduino. It is safer to synchronize often (every measure in my case).

USB Raw HID with hacked 8KHz poll rate (125us poll interval) combined with Teensy 3.2 (or above). Mouse overclockers have achieved 8KHz poll rate with low USB jitter, and Teensy 3.2 (Arduino clone) is able to do 8KHz poll rate with a slightly modified USB FTDI driver on the PC side.
Barring this, and you need even better, you're now looking at PCI-Express parallel ports, to do lower-latency signalling via digital pins directly to pins on the parallel port. They must be true parallel ports, and not through a USB layer. DOS apps on gigahertz-level PCs were tested to get sub-1us ability (1.4Ghz Pentium IV) with parallel port pin signalling, but if you write a virtual device driver, you can probably get sub-100us within Windows.
Use raised priority & critical sections out of the wazoo, preferably a non-garbage-collected language, minimum background apps, and essentially consume 100% of a CPU core on your critical loop, and you can definitely reliably achieve <100us. Not 100% of the time, but certainly in the territory of five-nines (and probably even far better than that). If you can tolerate such aberrations.

To answer the question, there are two low latency methods:
Serial or parallel port. It is possible to get latency down to the millisecond scale, although your performance may vary depending on manufacturer. One good brand is Brainboxes, although their cards can cost over $100!
Write your own driver. It should be possible to achieve latencies on the order of a few hundred micro seconds, although obviously the kernel can interrupt your process mid-way if it needs to serve something with a higher priority. This how a lot of scientific equipment actually works. (and a lot of the people telling you that a PC can't be made to work on short deadlines are wrong).

For info, I just ran some tests on a Windows 10 PC fitted with two dedicated PCIe parallel port cards.
Sending TTL (square wave) pulses out using Python code (actually using Psychopy Builder and Psychopy coder) the 2 channel osciloscope showed very consistant offsets between the two pulses of 4us to 8us.
This was when the python code was run at 'above normal' priority.
When run at normal priority it was mostly the same apart from a very occassional 30us gap, presumably when task switching took place)

In short, PCs aren't set up to handle that short of deadline. Even using a bare metal RTOS on an Intel Core series processor you end up with interrupt latency (how fast the processor can respond to interrupts) in the 2-3 µS range. (see http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/industrial-solutions-real-time-performance-white-paper.pdf)
That's ignoring any sort of communication link like USB or ethernet (or other) that requires packetizing data, handshaking, buffering to avoid data loss, etc.
USB stacks are going to have latency, regardless of how fast the link is, because of buffering to avoid data loss. Same with ethernet. Really, any modern stack driver on a full blown OS isn't going to be capable of low latency because of what else is going on in the system and the need for robustness in the protocols.
If you have deadlines that are in the single digit of microseconds (or even in the millisecond range), you really need to do your real time processing on a microcontroller and have the slower control loop/visualization be handled by the host.

You have no guarantees about latency to userland without real time operating system. You're at the mercy of the kernel and it's slice time and preemption rules. Which could be higher than your maximum 100us.
In order for a workstation to respond to a hardware event you have to use interrupts and a device driver.
Your options are limited to interfaces that offer an IRQ:
Hardware serial/parallel port.
PCI
Some interface bridge on PCI.
Or. If you're into abusing IO, the soundcard.
USB is not one of them, it has a 1kHz polling rate.
Maybe Thunderbolt does, but I'm not sure about that.

Ethernet
Look for a board that has a gigabit ethernet port directly connected to the microcontroller, and connect it to the PC directly with a crossover cable.

Related

Is there a way to send data from the FPGA logic on a Zedboard to an external CPU without involvement of the ZYNQ PS?

I am a high school student, who is not very familiar with FPGAs and the Xilinx line.
I am running a ring oscillator module on a Zybo Z7 board. I am also running a counter module, which I want to sample at a high rate. I am currently sending the data through AXI to the ZYNQ processing system, and then using the inbuilt UART to USB buffer to send the data through a USB cable to my computer. On the computer side, I treat this input as a virtual serial line, and use a python script to take and log the data from an IOstream. This method takes very long, however, and I am trying to increase the sample speed. Thus, I was wondering if I could bypass the onboard PS, and connect the FPGA fabric directly to the UART buffer.
I have tried optimizing my PS code, which I have written in C. I have reached the point where it takes 30 oscillations of the onboard ZYNQ clock between samples. Now, however, I have created a newer and more reliable sampling framework in the FPGA logic, which requires a 'handshake' mechanism to start and stop the counter between samples. It takes a very long time for the PS to sample the counter, send the sample, and then restart the counter. Thus, the uptime of my sampling framework is a fraction of what I want it to be. Removing the PS would be ideal, as I know I can automate this handshake signal within the PL if I am able to connect it to a UART interface.
You can implement logic in the PL that can handle the UART communication thus bypassing the PS.
Here's an implementation you can try using:https://github.com/jakubcabal/uart-for-fpga
You would connect the UART pins to one of the Zybo Z7 Pmod ports and use an external USB to UART adapter such as this one, anyone would work as long it supports 3.3V: https://www.adafruit.com/product/5335
The adapter built into the board is connected to directly to the PS MIO pins and cannot be used by the PL.

Raspberry communication with multiple Arduino's over long distance wire

Recently i was digging information about communication between RaspberryPi and multiple Arduino slaves over long distance wire (10-15 meters). My initial thought was to use I2C, but after doing some research i have found out that wire length is a problem since it does not capable to transit/receive data over such a distance. Maybe someone would have any suggestions?
I was thinking about another approach - communication over ethernet (using shields). I would place a switch between all the Arduino nodes and Raspberry with multi threaded TCP server on RPI. Does it sound reasonable?
P.S. Wireless communication methods are not allowed.
You can use one of many standards for communications, such as RS-485 or CAN-bus. Both of those allow for "long" distances, but the longer the wire, the slower the speed.
You will need transceivers for each device, but you can buy pre-made modules for quite cheap.

Does chrome.serial API ensure data integrity?

I'm trying to understand whether its redundant for me to include some kind of CRC or checksum in my communication protocol. Does the chrome.serial and other chrome hardware communication API's in general if anyone can speak to them (e.g. chrome.hid, chrome.bluetoothLowEnergy, ...)
Serial communications is simply a way of transmitting bits and its major reason for existence is that it's one bit at a time -- and can therefore work over just a single communications link, such as a simple telephone line. There's no built-in CRC or checksum or anything.
There are many systems that live on top of serial comms that attempt to deal with the fact that communications often takes place in a noisy environment. Back in the day of modems over telephone lines, you might have to deal with the fact that someone else in the house might pick up another extension on the phone line and inject a bunch of noise into your download. Thus, protocols like XMODEM were invented, wrappering serial comms in a more robust framework. (Then, when XMODEM proved unreliable, we went to YMODEM and ZMODEM.)
Depending on what you're talking to (for example, a device like an Arduino connnected to a USB serial port over a wire that's 25 cm long) you might find that putting the work into checksumming the data isn't worth the trouble, because the likelihood of interference is so low and the consequences are trivial. On the other hand, if you're talking to a controller for a laser weapon, you might want to make sure the command you send is the command that's received.
I don't know anything about the other systems you mention, but I'm old enough to have spent a lot of time doing serial comms back in the '80s (and now doing it again for devices using chrome.serial, go figure).
I'm using Chrome's serial API to communicate with Arduino devices, and I have yet to experience random corruption in the middle of an exchange (my exchanges are short bursts, 50-500 bytes max). However, I do see garbage bytes blast out if a connection is flaky or a cable is "rudely" disconnected (like a few minutes ago when I tripped over the FTDI cable).
In my project, a mis-processed command wont break anything, and I can get by with a master-slave protocol. Because of this, I designed a pretty slim solution: The Arduino slave listens for an "attention byte" (!) followed by a command byte, after which it reads a fixed number of data bytes depending on the command. Since the Arduino discards until it hears an attention byte and a valid command, the breaking-errors usually occur when a connection is cut while a slave is "awaiting x data bytes". To account for this, the first thing the master does on connect is to blindly blast out enough AT bytes to push the Arduino through "awaiting data" even in the worst-case-scenario. Crude, yet sufficient.
I realize my solution is pretty lo-fi, so I did a bit of surfing around and I found this post to be pretty comprehensive: Simple serial point-to-point communication protocol
Also, if you need a strategy for error-correction over error-detection/re-transmission (or over my strategy, which I guess is "error-brute-forcing"), you may want to check out the link to a technique called "Hamming," near the bottom of that thread - That one looked promising!
Good luck!
-Matt

Is it really necessary the handshakng on an RS232 connection?

I'm building an electronic device that has to be prepared for RS232 connections, and I'd like to know if it's really necessary to make room for more than 3 pins (Tx, Rx, GND) on each port.
If I don't use the rest of signals (those made for handshaking): am I going to find problems communicating with any device?
Generally, yes, that's a problem. The kind of problem that you can only avoid if you can give specific instructions to the client on how to configure the port on his end. Which is never not a problem, if that's not done properly then data transfer just won't occur and finding out why can be very awkward. You are almost guaranteed to get a support call.
A lot of standard programs pay attention to your DTR signal, DSR on their end. Data Terminal Ready indicates that your device is powered up and whatever the client receives is not produced by electrical noise. Without DSR they'll just ignore what you send. Very simple to implement, just tie it to your power supply.
Pretty common is flow control through the RTS/CTS signals. If enabled in the client program, it won't send you anything until you turn on the Request To Send signal. Again very simple to implement if you don't need flow control, just tie it logically high like DTR so the client program's configuration doesn't matter.
DCD and Ring are modem signals, pretty unlikely to matter to a generic device. Tie them logically low.
Very simple to implement, avoids lots of mishaps and support calls, do wire them.
And do consider whether you can actually live without flow control. It is very rarely a problem on the client end, modern machines can very easily keep up with the kind of data rates that are common on serial ports. That is not necessarily the case on your end, the usual limitation is the amount of RAM you can reserve for the receive buffer and the speed of the embedded processor. A modern machine can firehose you with data pretty easily. If your uart FIFO or receive interrupt handler or data processing code cannot keep up then the inevitable data loss is very hard to deal with. Not an issue if you use RTS/CTS or Xon/Xoff handshaking or if you use a master/slave protocol or are comfortable with a low enough baudrate.

Serial protocol for sending image data

We have a custom-built microcontroller card (ST32 / ARM Cortex M3) which has a camera attached. The camera captures 10-bit greyscale at 1280x1024 resolution. We need to send that image data back to a PC host over serial. That's quite a big chunk of data; at 115200 baud transfer would be 3 minutes, assuming everything goes fine. Anything I implement to ensure robustness would seem to slow that process down (eg split into blocks, checksum the blocks, ask for resend if corrupt). So was wondering how people might make a good compromise between speed and integrity.
We are currently seeing real transfer times of about 6 minutes. We had to set the UART baud rate at a weird value - 1036800 - because at 115200 there were issues (PC is running at 115200). I'm more software than hardware so any thoughts as to why that might happen would be helpful!
Start by doing some easy compression on your image.
Either run-length encoding or delta encoding will give you less data to send.
There are much better algorithms like TIFF but you may want to trade off the complexity of TIFF-ing your buffer for easier software on the embedded side.
Then you can afford something simple like Xmodem for your compressed data.
That has the useful property of being a standard protocol too.
That might lead you to using a terminal+xmodem transfer style interface to your host.
That would make debugging the interface pretty simple too.
Tim Williscroft's answer about compressing your data is a nice first step.
Now from the serial protocol side, the real transfer rate depends a lot on how you configure and implement your software both side. The baud rate is not the only thing to care about:
Are you using hardware flow control? If using hardware flow control, you will be able to significantly increase the baud rate (x10) without generating overrun errors.
From the STM32 are you using DMA, interrupts or even worth polling method to manage the data transmission? I don't know the exact STM32 reference you are using but on the STM32 I used, the UART transmission FIFO was limited to 1 byte. So you are merely obliged to use DMA if you have performance issues.
Still from the STM32 side, you can greatly improve performances taking care on the bus accesses (and possible conflicts arbitration) your application is doing.
Moreover on STM32, all clocks are configurable. Using an external high speed oscillator (if one available on board) may be a good way to improve performance over the internal RC oscillator. Also take care about internal bus clock configuration!
Now from the PC side, the performance may be impacted depending how your application bufferize & treat the received data.
The first thing to do is to look where the time is taken:
Observe your UART signal with a scope. As you said the transfer takes twice the theoretical time, you shouldn't see a continuous signal. Without hardware flow control it is the STM32 that takes time to output data. With hardware flow control, also look at the flow control signal to determinate which side causes the pauses (it may be both).

Resources