Arduino measuring power to run code - arduino

I have c code running on bear metal (no OS). The code takes in some sensor data, performs a computation, forms a packet and transmits. The board is battery powered.
I'm interested in knowing the energy consumed for each operation in Jules. Is this possible? How would one go about doing it?

The number of joules used per instruction depends upon the processor you are using and which instruction you are looking at. I believe the ARM and the Atmel AVR processors have no real hardware power management which makes things simpler.
How much energy an instruction uses has to do with how much and what type of on-silicon circuitry it uses. This means that trying to theoretically compute the number of joules will be complicated since it is not simply related to the number of cycles the instruction uses.
So you’ll have to do it experimentally. Here’s what I’d do.
Remove all compiler optimizations
Do a frequency analysis to find your hot operations and pick out the most used. (I’m assuming we aren’t talking ASM instructions but ‘C’ instructions.)
Write a loop that repeats the instruction, say, 20 times, and have the loop run for several seconds at a minimum
Replace your battery with a power supply.
Use the series resistor to measure power as mentioned in the comment but (of course) scale the voltage appropriately.
Run the looping program and get a statistical sampling of the power.
Do this for all of your hot operations.
Compute power usage for your program
Validate the power usage against real power measurements of the execution of your program
Adjust (i.e. normalize) your computations as appropriate
You’ll also have to take into account the memory hierarchy. Accessing off chip memory takes energy. When operations or data are cached, it’s going to change your energy equation.
I figure this should work but don’t know. Good luck.

Related

What kind of code/instructions can make a MCU run at it's maximum power consumption

I am trying to evaluate the maximum power consumption of a MCU (Renesas RX72N or RX651). It's not battery powered and it's never run in sleep mode. I am thinking I can write a piece of benchmark code, in C of course, that should do a lot of complex calculations. While the MCU is executing the calculations, I can measure the current drawn by the MCU and deduce what's its maximum power consumption level. Is my understand correct? If so, do you think what kind of code I should write or is there already some open source code to use for the purpose?
Thanks in advance.
-woody
In general you just have to experiment. There is no fixed answer. First off check the datasheet, you certainly want flash on and all the peripherals out of reset. Fastest clock is probably good but understand that running from flash does not necessarily scale up, depends on the part (do they have wait states that have a table relative to processor clock speed).
Changing states burns power. So you want to try to flip as many as you can. But it could be that you want to flip gpio or other external pins rather than try to get the processor core to pull more power.
You will want a nice meter for this one that can measure milliamps or milliwatts.
Things like multiply and divide in theory take a good percentage of chip space, if they implement it in one clock, but some of these mcus naturally won't do that, the instruction will be multi clock or at times the chip vendor can choose (if they buy an arm core for example). But you may need to deal with data patterns to to get more consumption. For core stuff you probably want to do as much register based stuff vs read/save things to memory.
You probably want to write the test code in assembly language. Will want to start with an infinite loop, branch to self, measure power/current. Complicate the loop, additional memory cycles, or alu operations, see if you can detect a power difference (you may find that you are not going to be able to make much difference with the core). Depending on the mcu design you may/should get better execution performance running from ram, but it depends of course. ram tends to be faster in mcus than flash. then do things like flip gpio pins, leave them high, etc. If you have LEDs turn them all on naturally, etc.
There is no one answer for this, so no one benchmark nor one solution pushes any random chip the hardest. Assume that if possible to see a noticeable difference for a particular chip, that the test would be specific to that chip and not necessarily the same solution for other chips from the same company, much less chips from other companies.

efm32 - efm32gg380f1024 - rtc features

I'm working with the efm32gg380f1024 on a project.
I currently use the BURTC timer (ULFRC clock) as tick source and I would like to use the normal RTC timer(LFRC clock) as well.
Do they exclude each other or can I use both the same time?
I was wondering if someone has already experience with the GG-series of silicon labs and give me some hints?
also what I'm wondering, I do have both LFXO and HFXO on my board currently not used. when I initialize the external clock setup, can I disable the interal rcos since they are not used (??) and just need energy.
the target is battery powered and each uWs counts..
thanks
You have a couple of questions here.
Yes you can use the LETIMER (which is what I think you mean when you say LERTC) peripheral independently of the RTC. They are separate peripherals, but note that the LETIMER is clocked from the same clock as the RTC.
As for using the external crystal oscillators, you need only enable the clocking sources that you actually use. However, clocking sources and entry/exit of the various low power energy modes interact. It can be rather tricky and complex. I suggest you use emlib to control these peripherals and in particular to enter and exit lower power energy modes.
If power consumption is important to you, note that high frequency clocking of the processor core consumes lots of power. Of course this must be traded off with how long you remain awake before going back to a lower power mode and with any real time requirements you might have for processing. Pushing off work to peripherals and using DMA to perform data movement is generally a win. Expect to do a good bit of tuning and you will need ways to accurately measure the power consumption. Using internal RC oscillators for clocking may be sufficient and a lower power approach. The low frequency external crystals tend to be 32 KHz clock crystals and don't consume that much power. They are a good alternative to the internal RC oscillators if you need better frequency stability.

Why people wouldn't just use the maximum available clock in microcontrollers [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 4 years ago.
Improve this question
The topic is rather straight forward and I admit i wasn't able to find much on google.
Recently i started to code on STM32 and for a person that is coming from PC-related application, setting all the clocks was rather new.
I was wondering why a developer would want to discard/avoid maximum clock and in which condition?
Say that a microcontroller could work at 168Mhz, why should i choose 84Mhz?
Is it mainly related to power consumption? Are there any other reason?
Why the STM32 team (and microchip as well i guess) took the hassle of setting up a really nice UI on STM32CubeMX to select different combination?
Why should i use an external oscillator directly rather than the PLL path if i can achieve higher working frequency?
Is it mainly related to power consumption?
Yes, mainly. Lower frequency means lower consumption.
One could also save power by doing the work fast, then putting the cpu to sleep, thus improving average consumption, but the power supply might not like the variable load, and exact timekeeping would be rather difficult.
Are there any other reason?
Yes. Some peripherals don't work above certain frequencies. An example: the STM32F429 core can run at 180 MHz, but then there is no way to generate 48 MHz for USB. In order to use USB, the core must run at 168 MHz.
Why should i use an external oscillator directly rather than the PLL path if i can achieve higher working frequency?
An external oscillator has a much higher accuracy than an internal one, and it may take too long for a PLL to stabilize when waking up from standby. It depends on the application requirements.
Power is probably the main reason. But there may be various other reasons that a specific clock speed is used such as:
EMC emmissions.
Avoiding harmonics which interfere with sensitive analogue electronics.
Driving timers / clocks / ADCs etc that are required to be run at a very specific frequency as part of the design (For example I worked on a processor that run at 120MHz, however in order to get the exact required ADC sampling we had to run at something like 119.4MHz).
You might want to use an external oscillator if you intend to use the clock elsewhere on the board. Also you might want to use a very accurate crystal, or maybe not want to wait for the PLL to lock.
There are lots of reasons. However if you are doing something straight forward and don't care about power consumption or noise then running at max speed with the PLL is probably the best place to start in my opinion.
Power is the obvious one, and as it has been touched on in other answers but not directly. Performance. Faster clock does not mean faster code. ST has this magic cachy thing in front of the flash in addition to a real-ish cache in front of the flash (and disabled the arm cache it appears on the cortex-m4's I have tried). But in general the flash is your bottleneck, if as you see on a number of other vendor parts and sometimes ST, you have to keep adding wait states as you increase the system clock. So say at 16Mhz no wait states, at 32, one, 48 two and so on, depends on the system, you are dancing around the speed limit of the flash, making the processor sit around extra clocks while it waits for instructions to come in. And even on an st but easier to see on others, that directly affects your performance, you want to be perhaps right at the frequency where you go from zero to one wait states to maximize what you can feed the cpu.
Some designs the flash is already at half speed of the cpu/system clock where sram generally tracks and can cover the full range, so take the same code at zero wait states and run from flash then run from ram, on some number of MCUs the same machine code runs half speed on flash as it does on sram. some it is one to one and then when you add a wait state flash is half speed vs ram, and so on.
Peripherals have the same problem as mentioned. You may have to use a prescaler on the peripheral clock, so now reading a gpio pin that at 24mhz might have taken one clock at 80mhz may take three or four clocks.
PLLs are analog and jittery, they dont necessarily "lose" any clock cycles, but are worse as far as jitter as the oscillator which itself has a spec on jitter and accuracy as well as temperature affects. So the internal RC is the worst quality clock, direct from the oscillator bypassing the PLL is the best, then multiplying with the PLL will add jitter, but will allow you to go faster.
Back to power. The battery in your remote control on your TV might last a year or so (infra red, the bluetooth ones are days or weeks), they run the lowest clock rate they can to barely do the job and stay off as much as possible or in low power state. If they were to hop up to 120Mhz when they were awake and the battery now lasts weeks or half a year, vs a year, for no real benefit other than it is really cool to run at full speed. That doesnt make much sense. We heavily rely on battery based products now, if the microcontroller in the bluetooth module on your phone ran at its fully rated pll speed, and the microcontroller in your wifi module in your phone, and the one that drives the display, etc, all ran at max speed, your phone might not last even a day on one full charge. Nothing was gained by running faster, but something was lost.
For hobbies and stuff plugged into the wall, burn whatever power you want, but a noticable percentage of the mcu market is about price and power, chips that are screened to a higher speed cost more, lower speed parts, in some cases are just the chips that failed the higher screen, and cost less. tighter smaller code uses less flash you can buy a smaller/cheaper part, your clock can run slower because it takes fewer instructions to do the same thing than a possibly sloppy program and/or bad choice in programming languages, bigger part, lower yield both add to cost. then lowering the clock as low as you can go to keep your tightly written code to just barely meet timing uses the least amount of power ideally (As well as turning off or not turning on peripherals you are not using and prescaling the ones that are on even slower).
For cost and power you want the slowest clock you can tolerate with the smallest binary that is also tight and efficient so that you can just barely make timing. That is your ideal goal. But if you plan for field upgrades then you need to leave some margin for slower/larger code to be part of the upgrade and not have a dramatic effect on power consumption.

What does CPU frequency represent in an Arduino board?

I'm new to Arduino and microcontrollers.
I was studying the specs and found that even the same board may have different frequencies with different input voltages (3.3V vs 5V). So the question is, what does frequency represent? Does it represent how many lines of assembly code it's able to run? Or the maximum PWM frequencies it's able to output?
A further question would be, if I'm looking for a board for a specific project, how do I decide which frequency I will need a priori, instead of trying everything out and see which one works?
What makes me more confused is that when it comes to computer CPUs, it seems that lower frequency CPUs can actually run faster than higher frequency ones (e.g. Intel). So how do I actually know how fast a microcontroller can run?
By frequency we mean the frequency of the CPU clock. Say your Arduino Uno runs on 16 MHz, which is 16,000,000 Hertz.
That means there are 16 Million clock cycles per second. The CPU executes the byte-code of the program. One Assembler instruction can actually take any number of CPU cycles to execute, usually between 1 and 4 cycles for simple stuff, and a little bit more heavy arithmetic and writing to memory. So it's a rough estimate of how many "lines of assembler" (that is, byte-code instructions) it can run per second. A measurement which is a little bit better is the "MIPS" value, the "Millions of instructions per second". There are other benchmark types for CPUs which are more accurate.
If you take at the datasheet for the AVR microprocessor architecture, you can see the cycles that each instructions needs: (link: http://www.atmel.com/images/Atmel-0856-AVR-Instruction-Set-Manual.pdf)
So for an ADD Rd, Rr instruction, an AVR CPU needs 1 clock cycle.
Take a desktop Intel CPU for example. It's common these days that they have a clock frequency of 2 GHz or more, which is 2 Billion cycles per second compared to the 16 million cycles per second on the Arduino's AVR CPU. So the Intel CPU beats the Arduino by far. Then again, the Arduino is designed for completley different stuff - it's a small microcontroller with low overhead, runs no OS etc. The use-case for such a CPU (and the architecture) is just different, which makes comparing them unjustified. There many other factors in play, like multi-core CPUs (4 Intel CPU cores vs. 1 AVR) and command pipelining, the speed of your memory / RAM, etc. It's really hard to compare a CPU to another one in every use-case possible, but for "general purpose computing", the Desktop CPUs (AMD, Intel, x64 architecture) far outruns the processing power of a mere Arduino AVR CPU.
I hope this clears up some confusion.
I think one confusion you have may be chip specific, I am not going to look it up right now but I do remember seeing this, the chip spec may say that for this input voltage range it can handle this frequency and for this voltage range it cannot. I think sparkfun the 3.3 are 8mhz and 5.0 are 16mhz or something like that. Anyway, that is not generally the case, but it is a chip by chip vendor by vendor thing and that is why you have to read the datasheet. Has nothing to do with arduinos or avrs specifically, just a general chip design thing.
How do you know how fast your microcontroller can run? That is a very loaded question, depends on your definition of fast. If it is simply what clock frequencies can I use, well "just read the datasheet" for that part, and then depending on your board design choose from what is available, if you do not have any external clocks then your choices may or may not be more limited, you may or may not have a pll that you can use to multiply the clock source.
if your definition of fast is how fast can I perform this task, how many whatevers per second or how much wall clock time does it take to complete some specific task. Well that is a benchmark problem and there are so many variables that there is actually no real answer. Yes it is very true that an x86 can have a lower clock and run faster than some other x86, historically the newer ones can do less stuff per clock than older ones for the same binaries, you have to then tune the compile to the newer chip and then you might get back some of your mips to mhz. but that is in part because you are using a different chip design that just speaks the same language (machine code). You can have a tall person that can recite a poem faster than a short person, both using english and the same poem, has nothing to do with them being short or tall, just that they are different humans.
There are different avr core variations but not remotely on par with the different x86 architectures. so while comparing a tiny vs an xmega you can probably have the xmega run "faster" at the same clock rate simply because it has more registers or a bigger address space, etc. But instructions per second is probably not really different, could be, but my guess is not so much.
Then there is the compiler, the compiler plays a huge role in how "fast" your code runs, change compilers or compiler versions or compiler settings and the machine code produced from the same high level source code (C for example) can vary greatly and as a result can have dramatic effects on the "speed" of the code. Take the dhrystone for example, very easy to demonstrate that the same exact source code on the same exact chip/board, same clock rates, etc can execute at vastly different speeds based on either using different compilers, versions or command line settings, kinda proving that the godfather of benchmarks is basically useless in providing any meaningful information.
Microcontrollers make the problem much worse as you often are running the program out of flash, and many, not all, but many have the ability to either divide or multiply or both the clock, but the flash is not always designed for the full range. You might have a chip that boots on an internal clock at 8mhz but you can use the pll to multiply that up to say 80mhz. But not uncommon that the flash is limited to say 16mhz on a chip like that so at 8mhz the flash can deliver an item say an instruction every cpu clock, but at 20mhz you have to put a wait state and although the cpu is running much faster you can only feed it at 16mhz so it is waiting around more, and then acts fast when it gets something, is it really "faster" or is clocking up making you slower. Certainly at just under 16mhz in this fantasy chip I am describing you can keep it to zero wait states so it is really faster, not necessarily twice as as there are other factors, but definitely faster than 8mhz. just at or above 16mhz though you take a huge performance hit compared to just under 16mhz. at just under 32mhz though it is pretty fast compared to just under, then at just over 32mhz another wait state setting and much slower again even though the clock is basically the same and so on.
Then there is the fetching, how does the cpu actually fetch, like an arm where it fetches a bunch of data per fetch transaction, even if it is not going to execute all of them if you branch to 0x1004 and at that address there is a branch to 0x2008 the core might fetch 0x10 bytes from 0x1000 to 0x100F, THEN extract the 0x1004 word/instruction, decode it to find it is a branch then read 0x10 bytes from 0x2000. basically reading 0x20 bytes to find 2 instructions. Take two instructions if both are in the 0x10 bytes then good if one is at 0x100C and the other at 0x2000 that is a performance hit. take this internal information and apply it to an application and all of its jumping around, changing one line of code or adding or removing a single nop to the bootstrap (causing the alignment of the program to change in the address space) can cause anywhere from a tiny to a large change in performance, swap two helper function sin the source code of your program, in the text, causing them to land in different address spaces once compiled, can have little to major performance effects without actually changing the functions themselves.
So performance is first of a foolish task to go after in one respect, in the other respect all that matters is your program as written with the compiler you are using on the hardware you are using, it is as fast as it is, and there are things you can do to make that code faster on that compiler on that target on that day, by changing compiler settings or the code or both. And ideally you build your final firmware, performance test that, and never build again as if you build a year or two later it may be on a different host compiler with a different compiler or compiler version and all bets are off on performance.
How do you pick a board, how much flash, ram, features, clock rate. A lot of it is experience by just trial and error, you fortunately live in a time where you can literally try hundreds of boards all of which cost anywhere from a few bucks to like 10 or 20 each, different cpu architectures different chip vendors, etc. there are many compiler choices and even languages available, basically there are too many easy to acquire choices, unlike back in the day when the parts were pretty cheap but you may have had to build your own board, write your code in asm, maybe even create your own assembler, etc. Have a rom programmer that cost hundreds to thousands of dollars. So go with the AVR you have and play with its features, play with the compiler and/or write or both. Do experiments to see if there are fetch effects or not. If you have clocking choices mess with those see what happens.
Of course all of this starts with reading the chip documentation from the vendor.

Arduino drone project

I am working on a drone project and currently choosing a board to use. Is it possible to use an Arduino Nano for all needs which are:
Gyroscope and Accelerometer
Barometer (as an altimeter)
Digital magnetometer
WiFi (to send telemetry for processing)
GPS module
4 motors (of course)
P.S:
I know nothing about Arduino. However I have a good ASM, C/C++, programming background and I used to design analog circuits.
I would like to avoid using ready-made flight controllers.
Pin count should not be too much of an issue if using I²C sensors, they would simply all share the same two pins (SCL, SDA).
I agree that the RAM could be a limitation, the processing power (30 MIPS for an arduino uno) should be sufficient.
On an arduino mega, the APM project ran for years with great success.
I believe it's possible to do a very simplified drone flight controller with an Arduino nano and several I²C sensors + GPS.
But even with a more advanced microcontroller it's not a trivial task.
*** If you still want to try the experiment, have a look at openlrs project : https://code.google.com/p/openlrs/ . It's quite old (there are several derived projects too), but it runs on a hardware similar to arduino uno (atmega328). It provides RC control, and quad flight controller with i²c gyroscopes, accelerometers (based on wii remote), and barometer.
It also parse data from the GPS, but afaik it doesn't provide autonomous navigation but it should be possible to add it without too much additional work.
edit : about the available RAM.
I understand that at first sight 2kb of RAM seems a very small amount. And a part of it is already used by Arduino, for example the serial library provides two 64 bytes FIFO, using some RAM. Same for the Wire (I²C) library, although a smaller amount. It also uses some RAM for stack and temporary variables, even for simple tasks such as float operations. Let's say in total it will use 500 bytes.
But then what amount of RAM is really required ?
- It will have a few PIDs regulators, let's say that each one will use 10 float parameters to store PID parameters, current value etc. So it gives 40 bytes per regulator, and let's say we need 10 regulators. We should need less, but let's take that example. So that's 400 bytes.
-Then it will need to parse GPS messages. A GPS message is maximum 80 bytes. Let's allow a buffer of 80 bytes for GPS parsing, even if it would be possible to do most of the parsing "on-the-fly" without storing it in a buffer.
-Let's keep some room for the GPS and sensors data, 300 bytes which seems generous, as we don't need to store them in floats. But we can put in it the current GPS coordinates, altitude, number of satellites, pitch, roll etc
-Then some place for application data, such as home GPS coordinates, current mode, stick positions, servo values etc.
The rest is mostly calculations, going from the current GPS coordinates and target coordinates to a target altitude, heading etc. And then feed the PIDs to the calculated pitch and roll. But this doesn't require additional RAM.
So I would say it's possible to do a very simple flight controller using 1280 bytes. And if I was too low or forgot some aspects, there's still more than 700 bytes available.
Certainly not saying it's easy to do, every aspect will have to be optimized, but it doesn't look impossible.
It would be a trick to make all of that work on a Nano. I would suggest you look at http://ardupilot.com/ they have built a lot of cool thinks around the ARM chip (same as an Arduino) and there are some pretty active communities on there as well.
Even if you didn't run out of pins (and you probably would), by the time you wrote the code for the motors and the GPS, you will run out of RAM.
And that's not even getting into the CPU speed, which is nowhere near enough. As mentioned in the other answer, you'll be better off with a Cortex M-x CPU.
Arguably, you could use a few Nanos, one per task, but chaining them together would be a nice mess...

Resources