The elegant way to handle ADCs with DMA in a RTOS - tcp

I'm currently setting up an AZURE RTOS (ThreadX on STM32), with Ethernet, SPI and ADCs activated.
This STM32 has to pass-through configuration information from time to time, coming from my PC over the Ethernet-Port.
It has to pass these information via SPI to two other STM32, which makes the first STM32 the system-controller / system-interface. This will be a low-priority task, since the activation of the passed configuration will be started by sync-lines, running from the system-controller to the two other STMs.
While doing so, the system-controller has to read-in ADC values constantly and pass them via Ethernet / TCP to my computer.
I've used the ThreadX TCP server example, as given by STM, as a starting point.
From there I've managed to set up three servers on three ports, communicating sucessfully with a python script on my PC (as a first test).
Now come the two great questions:
1)
Since my input signal may contain frequencies up to 2.5 MHz, I want to digitize this signal with the full 5 MSPS (Nyquist), which ADC3 is capable of.
The smallest internally available data-type at full resolution is uint16_t, which makes the data rate work out to be R = 16 * 5 MSPS = 80 MBit/s (worst-case, I bet, there is optimization possible ... e.g. 8 bits resolution, which halves the data-rate ... but this resolution might not be enough ... or 16 bits, and FFT afterwards, which is also sufficient, since I'm mostly interested in energy per frequency band, but initially I wanted to do this on my computer, for best flexibility).
Even if the Ethernet-IF is capable of doing 100MBit/s, the TCP layer of NetXDuo, I bet, is not.
(There is also USB OTG on this board available, but since networked devices are in my opinion more versatile, I prefer using Ethernet ... nevertheless, USB might be a backup solution)
From my measurements, a data-stream transmitted to the uC via TC from within python, and mirrored back within a thread to my PC allows for relatively consistent 20 MBit/s.
... How do I push this speed to a better level?
(I think 20MBit/s is the back-and-forth data-rate, so one-way may be faster)
However. Second question:
2)
The ADC within the STM is capable of storing data via DMA to memory.
There are two callbacks available, one at half-full, one at full buffer state.
My problem is mostly about the way of reading out the DMA and/or triggering the conversion in the first place.
How do you do this the "right" way on a RTOS (such that you don't brake the RT in RTOS)?
I see some options here, what are the pros/cons you can think of?
a) Let the ADC run freely, calling the call-backs at the respective fill-levels, triggering a TCP-transmission whenever one of the call-backs is reached
-> may lead to glitches due to insufficient speed of the TCP layer in my opinion.
b) Let the ADC conversion be triggered by a thread, which is preempted and will later TCP-transmit the data, as soon as the memory-buffer is full
-> may lead to inconsistency in the converted values, since you get burst-style conversions, with gaps in between, while the buffer is read
c) Let a thread trigger each conversion individually
-> A no-go I think, since threads are not triggered that often, to get a decent sample-frequency
d) Let a free-running ADC trigger callbacks, let a thread do the FFT, transmit within another thread the data via TCP
-> May work, but is less flexible, since the data gets crunched within the uC.
--> Are there other ways you can think of / what do you think about the ways I named here?
--> What do you think about question 1)?
Have a nice day!

Related

MPI_Lock_win / passive synchronization usage confusion

I'm trying to convert an application from using standard point-to-point MPI calls (e.g., MPI_Isend, MPI_Irecv) to using MPI-3's one-sided calls. My goal is to improve performance on my hardware, which is a system that has Infiniband hardware support and an MPI implementation that's optimized for RDMA calls. I've been told that the hardware performs particularly well with passive synchronization mode, as opposed to active synchronization (i.e., Post-Start-Complete-Wait).
However, even after reading through the MPI standard documentation and examples, I'm confused on how to actually use the calls. For context, my program has a setup phase where I will know the communication pattern and even the buffers of the send data and ultimate buffer of the receiver. So, it's straightforward to set up a window and use it.
Specifically, with passive synchronization, I'm confused about when the "receiver" knows the data in the window has been written by the sender. What I want to do is have the sender produce the message data, then call MPI_Win_lock on the window and then do an MPI_Put and then wait for completion with a MPI_Win_Unlock. But, what is an efficient / recommended way for the "receiver" (window target) of the data to know when the message data has been written? Similarly, given that the communication pattern is iterated and the same receive buffer (the target's buffer) is used multiple times, how do I know that the receiver is done consuming the buffer and it can be reused?
I can envision a couple of approaches:
I can use an MPI_Barrier after the MPI_Win_unlock and before the receiver accesses the data. (This seems that it would work but I'm skeptical that this would yield better performance than active synchronization.)
I can possibly use MPI_Lock and MPI_Unlock on the receiver (target), locking the window when the target is actually using the data so the access epoch can't start on the origin (but, is that the way it works? I've read that lock and unlock don't create critical sections in the traditional sense).
Some sort of home-grown approach where the receiver polls for some sort of a nonce to be written, knowing the data is available when that happens.
Docs for MPI_Win_lock: https://www.open-mpi.org/doc/v3.0/man3/MPI_Win_lock.3.php
In general, how does a programmer synchronize with MPI_Lock and 'MPI_Unlock` in a way that's any more efficient than the active synchronization approach? It does feel like I need to just use post-start-complete-wait, but I'm hoping you can help me find a way to try passive synchronization as well.

Oscilloscope type design with FPGA PL and PS framebuffer interface?

I am generating a certain signal (digital pulse) in one of my verilog module running on programmable logic in Xilinx Zynq chip. Signal is pretty fast, with clock of about 200MHz.
I also have a simple linux and framebuffer Qt interface running for later controlling my application.
How can I sample my signal in order to make oscilloscope like interface inside my Qt app? I want to be able to provide visual of the pulse I am generating.
What do I need to use to be able to sample enough data at such clock frequency? And how do I pass it with kernel module or mmap to Qt?
You would do best to do what most oscilloscopes do: sample the data to RAM, and only then transfer it to the processor for display/analaysis, at a more "relaxed" pace.
On the FPGA side you will need a state machine that detects some sort of start or trigger condition, probably after a bit in a mode register is set from the software side to arm it.
The state machine will then fill samples into a buffer made of one or more block rams. If you want to placing the trigger somewhere in the middle of the samples captured, you should it as a circular buffer, and have it record continuously, stopping configurable number of samples after the trigger, so that some desired number from before the trigger condition remain un-overwritten by newer ones following it.
Since FPGA block rams are typically dual port, you can simply hook the other port up to your CPU bus for readout. You will probably want a register to read the state of the sampling state machine, and if you go with the circular buffer approach, the address where it stopped, so that you can unwrap the data to a linear record of time.
Trying to do streaming realtime sampling might be possible, but would be a lot harder and it is not clear that you could do anything meaningful with the data so produced in real time. Still, if you want to try you would probably need to put a FIFO buffer in between the sampling and the processor bus, as you will probably only be able to consume data in chunks, while having to service other operations in between, so something is needed to absorb the constant-rate inflow of samples. Another approach could be to try to build a DMA engine which would write samples directly to external system ram, but that will likely be even harder.
You could also see if there are any high speed interfaces available in the CPU which you could leverage - they might be things originally intended for video, for example.
It also appears that you are measuring only a digital signal, ie, probalby one bit. If you want to handle a higher input sample rate than the FPGA fabric can support, that could mean you could potentially use something like a deserializer block at the edge of the FPGA to turn the 1-bit input stream into a slower stream of wider samples to store.
In terms of output, once you have a vector of samples in a buffer it's pretty simple to turn that into a scope/logic analyzer type plot, with as much zooming, cursor annotation, automatic measurement or whatever you like.
Also don't forget that if the intent is only to use this during development, FPGAs and their tools often have the ability to build a logic analyzer right into the design, with the data claimed over the programming interface for plotting on a PC.

SNMP network bandwith logger-monitor

I have a switch working with SNMP protocol. I want to get/log or monitor the data of bandwith for switch and connected devices/ports. the amount of incoming or outgoing data have to be calculated periodically into a log file simply.
As another option, a simple program for monitoring the network bandwith, total data traffic etc. of SNMP network may be useful for me. But it have to be so compact and light software. many programs are not freeware and their sizes are very big. Is there a solution to do that process? Thanks..
Interfaces monitored through SNMP report their data usage in the ifInOctets and ifOutOctets counters. The numbers they report can't be used directly; you need to sample them every X minutes or seconds, where X gets smaller the faster the interface. You simply subtract the previous number from the current one to give you how much traffic went by during those X minutes. Watch out for wrapping as it gets to the 32 bit integer limit (it certainly won't send negative traffic ;-) The number X will be greatly affected by how long it takes to wrap a 32 bit number at the interfaces maximum speed.
If you have a high speed switch, ideally you should actually use the ifHCInOctets and ifHCOutOctets if your switch supports it. These are 64-bit numbers and won't wrap frequently and thus X can become much much larger. But not all devices support them.

How to synchronize media playback over an unreliable network?

I wish I could play music or video on one computer, and have a second computer playing the same media, synchronized. As in, I can hear both computers' speakers at the same time, and it doesn't sound funny.
I want to do this over Wi-Fi, which is slightly unreliable.
Algorithmically, what's the best approach to this problem?
EDIT 1
Whether both computers "play" the same media, or one "plays" the media and streams it to the other, doesn't matter to me.
I am certain this is a tractable problem because I once saw a demo of Wi-Fi speakers. That was 5+ years ago, so I'm figure the technology should make it easier today.
(I myself was looking for an application which did this, hoping I wouldn't have to write one myself, when I stumbled upon this question.)
overview
You introduce a bit of buffer latency and use a network time-synchronization protocol to align the streams. That is, you split the stream up into packets, and timestamp each packet with "play later at time T", where T is for example 50-100ms in the future (or more if the network is glitchy). You send (or multicast) the packets on the local network, to all computers in the chorus. The computers will all play the sound at the same time because the application clock is synced.
Note that there may be other factors like OS/driver/soundcard latency which may have to be factored into the time-synchronization protocol. If you are not too discerning, the synchronization protocol may be as simple as one computer beeping every second -- plus you hitting a key on the other computer in beat. This has the advantage of accounting for any other source of lag at the OS/driver/soundcard layers, but has the disadvantage that manual intervention is needed if the clocks become desynchronized.
hybrid manual-network sync
One way to account for other sources of latency, without constant manual intervention, is to combine this approach with a standard network-clock synchronization protocol; the first time you run the protocol on new machines:
synchronize the machines with manual beat-style intervention
synchronize the machines with a network-clock sync protocol
for each machine in the chorus, take the difference of the two synchronizations; this is the OS/driver/soundcard latency of each machine, which they each keep track of
Now whenever the network backbone changes, all one needs to do is resync using the network-clock sync protocol (#2), and subtract out the OS/driver/soundcard latencies, obviating the need for manual intervention (unless you change the OS/drivers/soundcards).
nature-mimicking firefly sync
If you are doing this in a quiet room and all machines have microphones, you do not even need manual intervention (#1), because you can have them all follow a "firefly-style" synchronizing algorithm. Many species of fireflies in nature will all blink in unison. http://tinkerlog.com/2007/05/11/synchronizing-fireflies/ describes the algorithm these fireflies use: "If a firefly receives a flash of a neighbour firefly, it flashes slightly earlier." Flashes correspond to beeps or buzzes (through the soundcard, not the mobo piezo buzzer!), and seeing corresponds to listening through the microphone.
This may be a bit awkward over very large room distances due to the speed of sound, but I doubt it'll be an issue (if so, decrease rate of beeping).
The synchronization is relative to the position of the listener relative to each speaker. I don't think the reliability of the network would have as much to do with this synchronization as it would the content of the audio stream. In order to synchronize you need to find the distance between each speaker and the listener. Find the difference between each of those values and the value for the farthest speaker. For each 1.1 feet of difference, delay each of the close speakers by 1ms. This will ensure that the audio stream reaches the listener at the same time. This all assumes an open area, as any in proximity to your scenario will generate reflections of the audio waves and create destructive interference. Objects within the area may also transmit sound at a slower speed resulting in delayed sound of their own.

Why I cannot get equal upload and download speed on symmetrical channel?

I'm assigned to a project where my code is supposed to perform uploads and downloads of some files on the same FTP or HTTP server simultaneously. The speed is measured and some conclusions are being made out of this.
Now, the problem is that on high-speed connections we're getting pretty much expected results in terms of throughput, but on slow connections (think ideal CDMA 1xRTT link) either download or upload wins at the expense of the opposite direction. I have a "higher body" who's convinced that CDMA 1xRTT connection is symmetric and thus we should be able to perform data transfer with equivalent speeds (~100 kbps in each direction) on this link.
My measurements show that without heavy tweaking the code in terms of buffer sizes and data link throttling it's not possible to have same speeds in forementioned conditions. I tried both my multithreaded code and also created a simple batch file that automates Windows' ftp.exe to perform data transfer -- same result.
So, the question is: is it really possible to perform data transfer on a slow symmetrical link with equivalent speeds? Is a "higher body" right in their expectations? If yes, do you have any suggestions on what should I do with my code in order to achieve such throughput?
PS.
I completely re-wrote the question, so it would be obvious it belongs to this site.
CDMA 1x consists of up to 15 channels of 9.6kbps traffic. This results in a total throughput of 144kbps.
Two channels are used for command and control signals (talking to base stations, associating/disassociating, SMS traffic, ring signals, etc).
That leaves you with up to 124.8kbps.
--> Each channel is one way. <--
They are dynamically switched and allocated depending on the need.
Generally you'll get more download than upload because that's the typical cell phone modem usage. But you'll never get more than 120kbps total aggregate bandwidth.
In practise, due to overhead of 1xRTT encoding, error correction, resends, etc, you'll typically experience between 60kbps and 90kbps even if you have all the channels possible.
This means that you can probably only get 30kbps-60kbps of upload and download simultaneously.
Further, due to switching the channels dynamically (and the fact that the base station controls this more than your modem - they need to manage base station channels carefully to keep channels free for voice calls) you'll lose time when it switches channels - it's not an instantaneous process.
So - 1xRTT can, in theory, give you 124kbps one way, but due to overhead, switching times, base station capacity, or the phone company simply limiting such connections for other reasons, you can't depend on a symmetrical link.
NOTE:
This will vary to some degree based on the provider and the modem. For instance, some modems have 16 channels, and some providers support 16 channels. In some cases those modems and providers work well together and can provide a full 144kbps aggregate raw bandwidth to the application, with only one dedicated channel (which has to work pretty hard) to deal with control, switching, and other issues. Even then, though, with the overhead of the modem communications, then the overhead of PPP, then the overhead of IP, then the overhead of TCP, you're still looking at maybe 100-120kbps total bandwidth, both up and down.
Lastly, no provider yet supports transparent transfer of IP traffic. In other words if you're modem is moving, the modem will switch to a new base station, but you'll completely drop the PPP session and have to restart it, as well as all the TCP sessions and such. You typically won't get the same IP address, and so your TCP sessions will not recover gracefully.
The "fun" aspect to this twist is that this can happen even if you aren't moving. If one base station gets loaded down, you may be transferred to another base station if you are close enough - there are other things that may make your modem transfer even without you moving. So make sure you take this into account, since you seem to be keen on maintaining a full duplex, symmetric channel open. It's tough to write stuff that will recover gracefully, nevermind anticipate it and do it quickly. You would do well to work very closely with a modem manufacturer (such as Kyocera) on this - otherwise you won't get the documentation on how to control the modem chipset at the low level that you need.
-Adam
I think the whole drama with high equal speeds on both directions is because my higher body thinks that they have 144 kbps on uplink AND 144 kbps on DOWNLINK (== TWO pipes). Whereas in reality we have 144 kbps of ONE pipe which is switching directions when I transfer files.
Comment me if I right or wrong, please.

Resources