MCU communication speed - networking

This question may looks like little dump. sorry about it.
I saw Now day routers communicate few Gbps. but now day mcu's running speed lower than 4 GHz per second. Please explain How do they do this magic.

No magic at all :) They use specialized hardware which is only controlled by a (sometimes) small processor. So, they don't have to process every bit and byte but only tell the specialized hardware how to handle those bytes.
Take a look at a policeman standing in the middle of a crossing. He (the controller) just has to signal the drivers (specialized hardware) when and where to go. They handle the rest of it.
Also this hardware could use parallelization in data processing, meaning it could handle many bytes within one cycle, multiplying the throughput in comparison to the actual speed.
And: We have hybrid technology now where you have a cpu and specialized hardware modules on the same chip (like a microcontroller with rs232, spi and ethernet support). Same thing.
Hope this helps!

Related

What kind of code/instructions can make a MCU run at it's maximum power consumption

I am trying to evaluate the maximum power consumption of a MCU (Renesas RX72N or RX651). It's not battery powered and it's never run in sleep mode. I am thinking I can write a piece of benchmark code, in C of course, that should do a lot of complex calculations. While the MCU is executing the calculations, I can measure the current drawn by the MCU and deduce what's its maximum power consumption level. Is my understand correct? If so, do you think what kind of code I should write or is there already some open source code to use for the purpose?
Thanks in advance.
-woody
In general you just have to experiment. There is no fixed answer. First off check the datasheet, you certainly want flash on and all the peripherals out of reset. Fastest clock is probably good but understand that running from flash does not necessarily scale up, depends on the part (do they have wait states that have a table relative to processor clock speed).
Changing states burns power. So you want to try to flip as many as you can. But it could be that you want to flip gpio or other external pins rather than try to get the processor core to pull more power.
You will want a nice meter for this one that can measure milliamps or milliwatts.
Things like multiply and divide in theory take a good percentage of chip space, if they implement it in one clock, but some of these mcus naturally won't do that, the instruction will be multi clock or at times the chip vendor can choose (if they buy an arm core for example). But you may need to deal with data patterns to to get more consumption. For core stuff you probably want to do as much register based stuff vs read/save things to memory.
You probably want to write the test code in assembly language. Will want to start with an infinite loop, branch to self, measure power/current. Complicate the loop, additional memory cycles, or alu operations, see if you can detect a power difference (you may find that you are not going to be able to make much difference with the core). Depending on the mcu design you may/should get better execution performance running from ram, but it depends of course. ram tends to be faster in mcus than flash. then do things like flip gpio pins, leave them high, etc. If you have LEDs turn them all on naturally, etc.
There is no one answer for this, so no one benchmark nor one solution pushes any random chip the hardest. Assume that if possible to see a noticeable difference for a particular chip, that the test would be specific to that chip and not necessarily the same solution for other chips from the same company, much less chips from other companies.

efm32 - efm32gg380f1024 - rtc features

I'm working with the efm32gg380f1024 on a project.
I currently use the BURTC timer (ULFRC clock) as tick source and I would like to use the normal RTC timer(LFRC clock) as well.
Do they exclude each other or can I use both the same time?
I was wondering if someone has already experience with the GG-series of silicon labs and give me some hints?
also what I'm wondering, I do have both LFXO and HFXO on my board currently not used. when I initialize the external clock setup, can I disable the interal rcos since they are not used (??) and just need energy.
the target is battery powered and each uWs counts..
thanks
You have a couple of questions here.
Yes you can use the LETIMER (which is what I think you mean when you say LERTC) peripheral independently of the RTC. They are separate peripherals, but note that the LETIMER is clocked from the same clock as the RTC.
As for using the external crystal oscillators, you need only enable the clocking sources that you actually use. However, clocking sources and entry/exit of the various low power energy modes interact. It can be rather tricky and complex. I suggest you use emlib to control these peripherals and in particular to enter and exit lower power energy modes.
If power consumption is important to you, note that high frequency clocking of the processor core consumes lots of power. Of course this must be traded off with how long you remain awake before going back to a lower power mode and with any real time requirements you might have for processing. Pushing off work to peripherals and using DMA to perform data movement is generally a win. Expect to do a good bit of tuning and you will need ways to accurately measure the power consumption. Using internal RC oscillators for clocking may be sufficient and a lower power approach. The low frequency external crystals tend to be 32 KHz clock crystals and don't consume that much power. They are a good alternative to the internal RC oscillators if you need better frequency stability.

Is DMA the Correct Way to Receive High-Speed Digital Data on a Microprocessor?

I have been using the Teensy 3.6 microcontroller board (180 MHz ARM Cortex-M4 processor) to try and implement a driver for a sensor. The sensor is controlled over SPI and when it is commanded to make a measurement, it sends out the data over two lines, DOUT and PCLK. PCLK is a 5 MHz clock signal and the bits are sent over DOUT, measured on the falling edges of the PCLK signal. The data frame itself consists of 1,024 16-bit values.
My first attempt consisted a relatively naïve approach: I attached an interrupt to the PCLK pin looking for falling edges. When it detects a falling edge, it sets a bool that a new bit is available and sets another bool to the value of the DOUT line. The main loop of the program generates a uint_16 value from these bits and collects 1,024 of these values for the full measurement frame.
However, this program locks up the Teensy almost immediately. From my experiments, it seems to lock up as soon as the interrupt is attached. I believe that the microprocessor is being swamped by interrupts.
I think that the correct way of doing this is by using the Teensy's DMA controller. I have been reading Paul Stoffregen's DMAChannel library but I can't understand it. I need to trigger the DMA measurements from the PCLK digital pin and have it read in bits from the DOUT digital pin. Could someone tell me if I am looking at this problem in the correct way? Am I overlooking something, and what resources should I view to better understand DMA on the Teensy?
Thanks!
I put this on the Software Engineering Stack Exchange because I feel that this is primarily a programming problem, but if it is an EE problem, please feel free to move it to the EE SE.
Is DMA the Correct Way to Receive High-Speed Digital Data on a Microprocessor?
There is more than one source of 'high speed digital data'. DMA is not the globally correct solution for all data, but it can be a solution.
it sends out the data over two lines, DOUT and PCLK. PCLK is a 5 MHz clock signal and the bits are sent over DOUT, measured on the falling edges of the PCLK signal.
I attached an interrupt to the PCLK pin looking for falling edges. When it detects a falling edge, it sets a bool that a new bit is available and sets another bool to the value of the DOUT line.
This approach would be call 'bit bashing'. You are using a CPU to physically measure the pins. It is a worst case solution that I see many experienced developers implement. It will work with any hardware connection. Fortunately, the Kinetis K66 has several peripherals that maybe able to assist you.
Specifically, the FTM, CMP, I2C, SPI and UART modules may be useful. These hardware modules are capable of reducing the work load from processing each bit to groups of bits. For instance, the FTM support a capture mode. The idea is to ignore the PCLK signal and just measure the time between edges. These times will be fixed in a bit period/CLK. If the timer captures a two bit period, then you know that two ones or zeros were sent.
Also, your signal seems like SSI which is an 'digital audio' channel. Unfortunately, the K66 doesn't have an SSI module. Typical I2C is open drain and it always has a start bit and fixed word size. It maybe possible to use this if you have some knowledge of the data and/or can attach some circuit to fake some bits (to be removed later).
You could use the UART and time between characters to capture data. The time will be a run of bits that aren't the start bit. However it looks like this UART module requires stop bits (the SIM feature are probably very limited).
Once you do this, the decision between DMA, interrupt and polling can be made. There is nothing faster than polling if the CPU uses the data. DMA and interrupts are needed if you need to multiplex the CPU with the data transfer. DMA is better if the CPU doesn't need to act on most of the data or the work the CPU is doing is not memory intensive (number crunching). Interrupts depend on your context save overhead. This can be minimized depending on the facilities your main line uses.
Some glue circuitry to adapt the signal to one of the K66 modules could go a long way to making a more efficient solution. If you can't change the signal, another (NXP?) SOC with an SSI module would work well. The NXP modules usually support chaining to an eDMA module as well as interrupts.

What does CPU frequency represent in an Arduino board?

I'm new to Arduino and microcontrollers.
I was studying the specs and found that even the same board may have different frequencies with different input voltages (3.3V vs 5V). So the question is, what does frequency represent? Does it represent how many lines of assembly code it's able to run? Or the maximum PWM frequencies it's able to output?
A further question would be, if I'm looking for a board for a specific project, how do I decide which frequency I will need a priori, instead of trying everything out and see which one works?
What makes me more confused is that when it comes to computer CPUs, it seems that lower frequency CPUs can actually run faster than higher frequency ones (e.g. Intel). So how do I actually know how fast a microcontroller can run?
By frequency we mean the frequency of the CPU clock. Say your Arduino Uno runs on 16 MHz, which is 16,000,000 Hertz.
That means there are 16 Million clock cycles per second. The CPU executes the byte-code of the program. One Assembler instruction can actually take any number of CPU cycles to execute, usually between 1 and 4 cycles for simple stuff, and a little bit more heavy arithmetic and writing to memory. So it's a rough estimate of how many "lines of assembler" (that is, byte-code instructions) it can run per second. A measurement which is a little bit better is the "MIPS" value, the "Millions of instructions per second". There are other benchmark types for CPUs which are more accurate.
If you take at the datasheet for the AVR microprocessor architecture, you can see the cycles that each instructions needs: (link: http://www.atmel.com/images/Atmel-0856-AVR-Instruction-Set-Manual.pdf)
So for an ADD Rd, Rr instruction, an AVR CPU needs 1 clock cycle.
Take a desktop Intel CPU for example. It's common these days that they have a clock frequency of 2 GHz or more, which is 2 Billion cycles per second compared to the 16 million cycles per second on the Arduino's AVR CPU. So the Intel CPU beats the Arduino by far. Then again, the Arduino is designed for completley different stuff - it's a small microcontroller with low overhead, runs no OS etc. The use-case for such a CPU (and the architecture) is just different, which makes comparing them unjustified. There many other factors in play, like multi-core CPUs (4 Intel CPU cores vs. 1 AVR) and command pipelining, the speed of your memory / RAM, etc. It's really hard to compare a CPU to another one in every use-case possible, but for "general purpose computing", the Desktop CPUs (AMD, Intel, x64 architecture) far outruns the processing power of a mere Arduino AVR CPU.
I hope this clears up some confusion.
I think one confusion you have may be chip specific, I am not going to look it up right now but I do remember seeing this, the chip spec may say that for this input voltage range it can handle this frequency and for this voltage range it cannot. I think sparkfun the 3.3 are 8mhz and 5.0 are 16mhz or something like that. Anyway, that is not generally the case, but it is a chip by chip vendor by vendor thing and that is why you have to read the datasheet. Has nothing to do with arduinos or avrs specifically, just a general chip design thing.
How do you know how fast your microcontroller can run? That is a very loaded question, depends on your definition of fast. If it is simply what clock frequencies can I use, well "just read the datasheet" for that part, and then depending on your board design choose from what is available, if you do not have any external clocks then your choices may or may not be more limited, you may or may not have a pll that you can use to multiply the clock source.
if your definition of fast is how fast can I perform this task, how many whatevers per second or how much wall clock time does it take to complete some specific task. Well that is a benchmark problem and there are so many variables that there is actually no real answer. Yes it is very true that an x86 can have a lower clock and run faster than some other x86, historically the newer ones can do less stuff per clock than older ones for the same binaries, you have to then tune the compile to the newer chip and then you might get back some of your mips to mhz. but that is in part because you are using a different chip design that just speaks the same language (machine code). You can have a tall person that can recite a poem faster than a short person, both using english and the same poem, has nothing to do with them being short or tall, just that they are different humans.
There are different avr core variations but not remotely on par with the different x86 architectures. so while comparing a tiny vs an xmega you can probably have the xmega run "faster" at the same clock rate simply because it has more registers or a bigger address space, etc. But instructions per second is probably not really different, could be, but my guess is not so much.
Then there is the compiler, the compiler plays a huge role in how "fast" your code runs, change compilers or compiler versions or compiler settings and the machine code produced from the same high level source code (C for example) can vary greatly and as a result can have dramatic effects on the "speed" of the code. Take the dhrystone for example, very easy to demonstrate that the same exact source code on the same exact chip/board, same clock rates, etc can execute at vastly different speeds based on either using different compilers, versions or command line settings, kinda proving that the godfather of benchmarks is basically useless in providing any meaningful information.
Microcontrollers make the problem much worse as you often are running the program out of flash, and many, not all, but many have the ability to either divide or multiply or both the clock, but the flash is not always designed for the full range. You might have a chip that boots on an internal clock at 8mhz but you can use the pll to multiply that up to say 80mhz. But not uncommon that the flash is limited to say 16mhz on a chip like that so at 8mhz the flash can deliver an item say an instruction every cpu clock, but at 20mhz you have to put a wait state and although the cpu is running much faster you can only feed it at 16mhz so it is waiting around more, and then acts fast when it gets something, is it really "faster" or is clocking up making you slower. Certainly at just under 16mhz in this fantasy chip I am describing you can keep it to zero wait states so it is really faster, not necessarily twice as as there are other factors, but definitely faster than 8mhz. just at or above 16mhz though you take a huge performance hit compared to just under 16mhz. at just under 32mhz though it is pretty fast compared to just under, then at just over 32mhz another wait state setting and much slower again even though the clock is basically the same and so on.
Then there is the fetching, how does the cpu actually fetch, like an arm where it fetches a bunch of data per fetch transaction, even if it is not going to execute all of them if you branch to 0x1004 and at that address there is a branch to 0x2008 the core might fetch 0x10 bytes from 0x1000 to 0x100F, THEN extract the 0x1004 word/instruction, decode it to find it is a branch then read 0x10 bytes from 0x2000. basically reading 0x20 bytes to find 2 instructions. Take two instructions if both are in the 0x10 bytes then good if one is at 0x100C and the other at 0x2000 that is a performance hit. take this internal information and apply it to an application and all of its jumping around, changing one line of code or adding or removing a single nop to the bootstrap (causing the alignment of the program to change in the address space) can cause anywhere from a tiny to a large change in performance, swap two helper function sin the source code of your program, in the text, causing them to land in different address spaces once compiled, can have little to major performance effects without actually changing the functions themselves.
So performance is first of a foolish task to go after in one respect, in the other respect all that matters is your program as written with the compiler you are using on the hardware you are using, it is as fast as it is, and there are things you can do to make that code faster on that compiler on that target on that day, by changing compiler settings or the code or both. And ideally you build your final firmware, performance test that, and never build again as if you build a year or two later it may be on a different host compiler with a different compiler or compiler version and all bets are off on performance.
How do you pick a board, how much flash, ram, features, clock rate. A lot of it is experience by just trial and error, you fortunately live in a time where you can literally try hundreds of boards all of which cost anywhere from a few bucks to like 10 or 20 each, different cpu architectures different chip vendors, etc. there are many compiler choices and even languages available, basically there are too many easy to acquire choices, unlike back in the day when the parts were pretty cheap but you may have had to build your own board, write your code in asm, maybe even create your own assembler, etc. Have a rom programmer that cost hundreds to thousands of dollars. So go with the AVR you have and play with its features, play with the compiler and/or write or both. Do experiments to see if there are fetch effects or not. If you have clocking choices mess with those see what happens.
Of course all of this starts with reading the chip documentation from the vendor.

Serial Comms baud rate, parity and stop bits. Which options to use and when?

I'm trying to pick up some serial comms for a new job I am starting. I have done some reading which has helped a lot however, a lot of the reading tells you about the specification of serial comms and what everything is, but not when is best to use particular options.
My searches for this information so far only seem to pull in the spec; perhaps as a novice I am searching for the wrong terms.
My questions then!
Baud Rate - I have read this is signal changes per second and is often mislabelled as bits per second. Is this essentially bits per second including the frame data if asynchronous, and actually bits per second if synchronous?
Parity - Even/Odd.. Is there any difference at all between the two? I'm thinking in terms of efficiency or similar. Does this only still exist for compatabilities sake?
Stop Bits - I have read so far you can have 1 or 2 stop bits. In C# there seems to be an option for 1.5 too. I can't find anything on why you would want/need more than 1.
If anyone can advise on these points, or point me to some recommended reading material I would be very grateful.
Thanks for reading.
edit: typo
You very rarely have a choice, you must make it compatible with the settings that the device uses. If you don't know then you need to look in a manual or pick up a phone. Do keep in mind that it is increasing very rare to work with a real serial port device, one that uses an UART. Most commonly you actually talk to an emulated serial port, implemented by a USB or Bluetooth device driver. The settings you use don't matter in such a case since the actual signaling is implemented by the underlying bus.
If you can configure the device then basic guidelines are:
Baudrate is directly related to the length of the cable and the amount of electrical interference that's present. You have to go slower when you get bit errors. The RS-232 spec only allows for a maximum of 50 ft at 9600 baud.
Parity ought to be used when you don't use an error-correcting protocol. It does not matter whether you pick Odd or Even. Odd people pick odd, it's their prerogative.
Stopbits is usually 1. Picking 1.5 or 2 help a bit to relieve pressure on a device whose interrupt response times are poor, detected by data loss.
Databits is almost always 8, sometimes 7 if the device only handles ASCII codes.
Handshaking is an important setting that never stop causing trouble since many programmers just overlook it. Modern computers are almost always fast enough to not need it but that's not necessarily true for devices. The most basic stay-out-of-trouble configuration is to turn DTR on when you open the port and to tell the device driver to take care of RTS/CTS handshaking. Xon/Xoff handshaking is sometimes used, depends on the device.
A good 90% of the battle is won by implementing solid error checking. It is almost always skimped on, bad idea. Very important for serial port devices since they have no error correcting capabilities themselves and very weak error detection. Always make sure that you can detect and properly report overrun, parity and framing errors. And test them by getting the settings intentionally wrong.

Resources