I am working on a drone project and currently choosing a board to use. Is it possible to use an Arduino Nano for all needs which are:
Gyroscope and Accelerometer
Barometer (as an altimeter)
Digital magnetometer
WiFi (to send telemetry for processing)
GPS module
4 motors (of course)
P.S:
I know nothing about Arduino. However I have a good ASM, C/C++, programming background and I used to design analog circuits.
I would like to avoid using ready-made flight controllers.
Pin count should not be too much of an issue if using I²C sensors, they would simply all share the same two pins (SCL, SDA).
I agree that the RAM could be a limitation, the processing power (30 MIPS for an arduino uno) should be sufficient.
On an arduino mega, the APM project ran for years with great success.
I believe it's possible to do a very simplified drone flight controller with an Arduino nano and several I²C sensors + GPS.
But even with a more advanced microcontroller it's not a trivial task.
*** If you still want to try the experiment, have a look at openlrs project : https://code.google.com/p/openlrs/ . It's quite old (there are several derived projects too), but it runs on a hardware similar to arduino uno (atmega328). It provides RC control, and quad flight controller with i²c gyroscopes, accelerometers (based on wii remote), and barometer.
It also parse data from the GPS, but afaik it doesn't provide autonomous navigation but it should be possible to add it without too much additional work.
edit : about the available RAM.
I understand that at first sight 2kb of RAM seems a very small amount. And a part of it is already used by Arduino, for example the serial library provides two 64 bytes FIFO, using some RAM. Same for the Wire (I²C) library, although a smaller amount. It also uses some RAM for stack and temporary variables, even for simple tasks such as float operations. Let's say in total it will use 500 bytes.
But then what amount of RAM is really required ?
- It will have a few PIDs regulators, let's say that each one will use 10 float parameters to store PID parameters, current value etc. So it gives 40 bytes per regulator, and let's say we need 10 regulators. We should need less, but let's take that example. So that's 400 bytes.
-Then it will need to parse GPS messages. A GPS message is maximum 80 bytes. Let's allow a buffer of 80 bytes for GPS parsing, even if it would be possible to do most of the parsing "on-the-fly" without storing it in a buffer.
-Let's keep some room for the GPS and sensors data, 300 bytes which seems generous, as we don't need to store them in floats. But we can put in it the current GPS coordinates, altitude, number of satellites, pitch, roll etc
-Then some place for application data, such as home GPS coordinates, current mode, stick positions, servo values etc.
The rest is mostly calculations, going from the current GPS coordinates and target coordinates to a target altitude, heading etc. And then feed the PIDs to the calculated pitch and roll. But this doesn't require additional RAM.
So I would say it's possible to do a very simple flight controller using 1280 bytes. And if I was too low or forgot some aspects, there's still more than 700 bytes available.
Certainly not saying it's easy to do, every aspect will have to be optimized, but it doesn't look impossible.
It would be a trick to make all of that work on a Nano. I would suggest you look at http://ardupilot.com/ they have built a lot of cool thinks around the ARM chip (same as an Arduino) and there are some pretty active communities on there as well.
Even if you didn't run out of pins (and you probably would), by the time you wrote the code for the motors and the GPS, you will run out of RAM.
And that's not even getting into the CPU speed, which is nowhere near enough. As mentioned in the other answer, you'll be better off with a Cortex M-x CPU.
Arguably, you could use a few Nanos, one per task, but chaining them together would be a nice mess...
Related
I have been using the Teensy 3.6 microcontroller board (180 MHz ARM Cortex-M4 processor) to try and implement a driver for a sensor. The sensor is controlled over SPI and when it is commanded to make a measurement, it sends out the data over two lines, DOUT and PCLK. PCLK is a 5 MHz clock signal and the bits are sent over DOUT, measured on the falling edges of the PCLK signal. The data frame itself consists of 1,024 16-bit values.
My first attempt consisted a relatively naïve approach: I attached an interrupt to the PCLK pin looking for falling edges. When it detects a falling edge, it sets a bool that a new bit is available and sets another bool to the value of the DOUT line. The main loop of the program generates a uint_16 value from these bits and collects 1,024 of these values for the full measurement frame.
However, this program locks up the Teensy almost immediately. From my experiments, it seems to lock up as soon as the interrupt is attached. I believe that the microprocessor is being swamped by interrupts.
I think that the correct way of doing this is by using the Teensy's DMA controller. I have been reading Paul Stoffregen's DMAChannel library but I can't understand it. I need to trigger the DMA measurements from the PCLK digital pin and have it read in bits from the DOUT digital pin. Could someone tell me if I am looking at this problem in the correct way? Am I overlooking something, and what resources should I view to better understand DMA on the Teensy?
Thanks!
I put this on the Software Engineering Stack Exchange because I feel that this is primarily a programming problem, but if it is an EE problem, please feel free to move it to the EE SE.
Is DMA the Correct Way to Receive High-Speed Digital Data on a Microprocessor?
There is more than one source of 'high speed digital data'. DMA is not the globally correct solution for all data, but it can be a solution.
it sends out the data over two lines, DOUT and PCLK. PCLK is a 5 MHz clock signal and the bits are sent over DOUT, measured on the falling edges of the PCLK signal.
I attached an interrupt to the PCLK pin looking for falling edges. When it detects a falling edge, it sets a bool that a new bit is available and sets another bool to the value of the DOUT line.
This approach would be call 'bit bashing'. You are using a CPU to physically measure the pins. It is a worst case solution that I see many experienced developers implement. It will work with any hardware connection. Fortunately, the Kinetis K66 has several peripherals that maybe able to assist you.
Specifically, the FTM, CMP, I2C, SPI and UART modules may be useful. These hardware modules are capable of reducing the work load from processing each bit to groups of bits. For instance, the FTM support a capture mode. The idea is to ignore the PCLK signal and just measure the time between edges. These times will be fixed in a bit period/CLK. If the timer captures a two bit period, then you know that two ones or zeros were sent.
Also, your signal seems like SSI which is an 'digital audio' channel. Unfortunately, the K66 doesn't have an SSI module. Typical I2C is open drain and it always has a start bit and fixed word size. It maybe possible to use this if you have some knowledge of the data and/or can attach some circuit to fake some bits (to be removed later).
You could use the UART and time between characters to capture data. The time will be a run of bits that aren't the start bit. However it looks like this UART module requires stop bits (the SIM feature are probably very limited).
Once you do this, the decision between DMA, interrupt and polling can be made. There is nothing faster than polling if the CPU uses the data. DMA and interrupts are needed if you need to multiplex the CPU with the data transfer. DMA is better if the CPU doesn't need to act on most of the data or the work the CPU is doing is not memory intensive (number crunching). Interrupts depend on your context save overhead. This can be minimized depending on the facilities your main line uses.
Some glue circuitry to adapt the signal to one of the K66 modules could go a long way to making a more efficient solution. If you can't change the signal, another (NXP?) SOC with an SSI module would work well. The NXP modules usually support chaining to an eDMA module as well as interrupts.
I'm new to Arduino and microcontrollers.
I was studying the specs and found that even the same board may have different frequencies with different input voltages (3.3V vs 5V). So the question is, what does frequency represent? Does it represent how many lines of assembly code it's able to run? Or the maximum PWM frequencies it's able to output?
A further question would be, if I'm looking for a board for a specific project, how do I decide which frequency I will need a priori, instead of trying everything out and see which one works?
What makes me more confused is that when it comes to computer CPUs, it seems that lower frequency CPUs can actually run faster than higher frequency ones (e.g. Intel). So how do I actually know how fast a microcontroller can run?
By frequency we mean the frequency of the CPU clock. Say your Arduino Uno runs on 16 MHz, which is 16,000,000 Hertz.
That means there are 16 Million clock cycles per second. The CPU executes the byte-code of the program. One Assembler instruction can actually take any number of CPU cycles to execute, usually between 1 and 4 cycles for simple stuff, and a little bit more heavy arithmetic and writing to memory. So it's a rough estimate of how many "lines of assembler" (that is, byte-code instructions) it can run per second. A measurement which is a little bit better is the "MIPS" value, the "Millions of instructions per second". There are other benchmark types for CPUs which are more accurate.
If you take at the datasheet for the AVR microprocessor architecture, you can see the cycles that each instructions needs: (link: http://www.atmel.com/images/Atmel-0856-AVR-Instruction-Set-Manual.pdf)
So for an ADD Rd, Rr instruction, an AVR CPU needs 1 clock cycle.
Take a desktop Intel CPU for example. It's common these days that they have a clock frequency of 2 GHz or more, which is 2 Billion cycles per second compared to the 16 million cycles per second on the Arduino's AVR CPU. So the Intel CPU beats the Arduino by far. Then again, the Arduino is designed for completley different stuff - it's a small microcontroller with low overhead, runs no OS etc. The use-case for such a CPU (and the architecture) is just different, which makes comparing them unjustified. There many other factors in play, like multi-core CPUs (4 Intel CPU cores vs. 1 AVR) and command pipelining, the speed of your memory / RAM, etc. It's really hard to compare a CPU to another one in every use-case possible, but for "general purpose computing", the Desktop CPUs (AMD, Intel, x64 architecture) far outruns the processing power of a mere Arduino AVR CPU.
I hope this clears up some confusion.
I think one confusion you have may be chip specific, I am not going to look it up right now but I do remember seeing this, the chip spec may say that for this input voltage range it can handle this frequency and for this voltage range it cannot. I think sparkfun the 3.3 are 8mhz and 5.0 are 16mhz or something like that. Anyway, that is not generally the case, but it is a chip by chip vendor by vendor thing and that is why you have to read the datasheet. Has nothing to do with arduinos or avrs specifically, just a general chip design thing.
How do you know how fast your microcontroller can run? That is a very loaded question, depends on your definition of fast. If it is simply what clock frequencies can I use, well "just read the datasheet" for that part, and then depending on your board design choose from what is available, if you do not have any external clocks then your choices may or may not be more limited, you may or may not have a pll that you can use to multiply the clock source.
if your definition of fast is how fast can I perform this task, how many whatevers per second or how much wall clock time does it take to complete some specific task. Well that is a benchmark problem and there are so many variables that there is actually no real answer. Yes it is very true that an x86 can have a lower clock and run faster than some other x86, historically the newer ones can do less stuff per clock than older ones for the same binaries, you have to then tune the compile to the newer chip and then you might get back some of your mips to mhz. but that is in part because you are using a different chip design that just speaks the same language (machine code). You can have a tall person that can recite a poem faster than a short person, both using english and the same poem, has nothing to do with them being short or tall, just that they are different humans.
There are different avr core variations but not remotely on par with the different x86 architectures. so while comparing a tiny vs an xmega you can probably have the xmega run "faster" at the same clock rate simply because it has more registers or a bigger address space, etc. But instructions per second is probably not really different, could be, but my guess is not so much.
Then there is the compiler, the compiler plays a huge role in how "fast" your code runs, change compilers or compiler versions or compiler settings and the machine code produced from the same high level source code (C for example) can vary greatly and as a result can have dramatic effects on the "speed" of the code. Take the dhrystone for example, very easy to demonstrate that the same exact source code on the same exact chip/board, same clock rates, etc can execute at vastly different speeds based on either using different compilers, versions or command line settings, kinda proving that the godfather of benchmarks is basically useless in providing any meaningful information.
Microcontrollers make the problem much worse as you often are running the program out of flash, and many, not all, but many have the ability to either divide or multiply or both the clock, but the flash is not always designed for the full range. You might have a chip that boots on an internal clock at 8mhz but you can use the pll to multiply that up to say 80mhz. But not uncommon that the flash is limited to say 16mhz on a chip like that so at 8mhz the flash can deliver an item say an instruction every cpu clock, but at 20mhz you have to put a wait state and although the cpu is running much faster you can only feed it at 16mhz so it is waiting around more, and then acts fast when it gets something, is it really "faster" or is clocking up making you slower. Certainly at just under 16mhz in this fantasy chip I am describing you can keep it to zero wait states so it is really faster, not necessarily twice as as there are other factors, but definitely faster than 8mhz. just at or above 16mhz though you take a huge performance hit compared to just under 16mhz. at just under 32mhz though it is pretty fast compared to just under, then at just over 32mhz another wait state setting and much slower again even though the clock is basically the same and so on.
Then there is the fetching, how does the cpu actually fetch, like an arm where it fetches a bunch of data per fetch transaction, even if it is not going to execute all of them if you branch to 0x1004 and at that address there is a branch to 0x2008 the core might fetch 0x10 bytes from 0x1000 to 0x100F, THEN extract the 0x1004 word/instruction, decode it to find it is a branch then read 0x10 bytes from 0x2000. basically reading 0x20 bytes to find 2 instructions. Take two instructions if both are in the 0x10 bytes then good if one is at 0x100C and the other at 0x2000 that is a performance hit. take this internal information and apply it to an application and all of its jumping around, changing one line of code or adding or removing a single nop to the bootstrap (causing the alignment of the program to change in the address space) can cause anywhere from a tiny to a large change in performance, swap two helper function sin the source code of your program, in the text, causing them to land in different address spaces once compiled, can have little to major performance effects without actually changing the functions themselves.
So performance is first of a foolish task to go after in one respect, in the other respect all that matters is your program as written with the compiler you are using on the hardware you are using, it is as fast as it is, and there are things you can do to make that code faster on that compiler on that target on that day, by changing compiler settings or the code or both. And ideally you build your final firmware, performance test that, and never build again as if you build a year or two later it may be on a different host compiler with a different compiler or compiler version and all bets are off on performance.
How do you pick a board, how much flash, ram, features, clock rate. A lot of it is experience by just trial and error, you fortunately live in a time where you can literally try hundreds of boards all of which cost anywhere from a few bucks to like 10 or 20 each, different cpu architectures different chip vendors, etc. there are many compiler choices and even languages available, basically there are too many easy to acquire choices, unlike back in the day when the parts were pretty cheap but you may have had to build your own board, write your code in asm, maybe even create your own assembler, etc. Have a rom programmer that cost hundreds to thousands of dollars. So go with the AVR you have and play with its features, play with the compiler and/or write or both. Do experiments to see if there are fetch effects or not. If you have clocking choices mess with those see what happens.
Of course all of this starts with reading the chip documentation from the vendor.
I am generating a certain signal (digital pulse) in one of my verilog module running on programmable logic in Xilinx Zynq chip. Signal is pretty fast, with clock of about 200MHz.
I also have a simple linux and framebuffer Qt interface running for later controlling my application.
How can I sample my signal in order to make oscilloscope like interface inside my Qt app? I want to be able to provide visual of the pulse I am generating.
What do I need to use to be able to sample enough data at such clock frequency? And how do I pass it with kernel module or mmap to Qt?
You would do best to do what most oscilloscopes do: sample the data to RAM, and only then transfer it to the processor for display/analaysis, at a more "relaxed" pace.
On the FPGA side you will need a state machine that detects some sort of start or trigger condition, probably after a bit in a mode register is set from the software side to arm it.
The state machine will then fill samples into a buffer made of one or more block rams. If you want to placing the trigger somewhere in the middle of the samples captured, you should it as a circular buffer, and have it record continuously, stopping configurable number of samples after the trigger, so that some desired number from before the trigger condition remain un-overwritten by newer ones following it.
Since FPGA block rams are typically dual port, you can simply hook the other port up to your CPU bus for readout. You will probably want a register to read the state of the sampling state machine, and if you go with the circular buffer approach, the address where it stopped, so that you can unwrap the data to a linear record of time.
Trying to do streaming realtime sampling might be possible, but would be a lot harder and it is not clear that you could do anything meaningful with the data so produced in real time. Still, if you want to try you would probably need to put a FIFO buffer in between the sampling and the processor bus, as you will probably only be able to consume data in chunks, while having to service other operations in between, so something is needed to absorb the constant-rate inflow of samples. Another approach could be to try to build a DMA engine which would write samples directly to external system ram, but that will likely be even harder.
You could also see if there are any high speed interfaces available in the CPU which you could leverage - they might be things originally intended for video, for example.
It also appears that you are measuring only a digital signal, ie, probalby one bit. If you want to handle a higher input sample rate than the FPGA fabric can support, that could mean you could potentially use something like a deserializer block at the edge of the FPGA to turn the 1-bit input stream into a slower stream of wider samples to store.
In terms of output, once you have a vector of samples in a buffer it's pretty simple to turn that into a scope/logic analyzer type plot, with as much zooming, cursor annotation, automatic measurement or whatever you like.
Also don't forget that if the intent is only to use this during development, FPGAs and their tools often have the ability to build a logic analyzer right into the design, with the data claimed over the programming interface for plotting on a PC.
curious if a Arduino could be configured to read and serial print raw binary bits from a proprietary serial encoder used on machine tools and robots... if so, might have a lot of other possible uses.
I made up a 120 volt servo drive for manually moving big fanuc machine/robot servos, handy during rebuilding/service to be able to move a axis without a control...but the older drives read 4 gray code channels kinda like a set of halls for brushless commutation...on the serial versions, same drive could move the motor if I could decipher the commutation bits and output graycode to the drive... a tiny Arduino looks like a ideal little thing to try doing this.
Scoped out the signals long ago, kinda know where the bits are, but need to be able to actually print them out thru 90 degrees of shaft rotation to find the 12 steps for commutation required by the drive.
Arduino is new to me, but in the past few days have been quite impressed with its abilities.
If anyone can suggest a way for Arduino to read a repeating 77 bit data stream at ~100K baud, I'm all ears... I think a 'serial snoop tool' with easily changed baud rates(including non-standard) and 'word length', then serial print out could be really handy. to prevent overflow in my case, could only do the serial print every X milliseconds, and I could just rotate slow enough to get a decent sample.
I'd use a logic analyser with the ability to decode serial streams - it's a much more versatile tool in the end. There are many of them - I've used the Saleae Logic, and an Open Bench Logic Sniffer to good effect in the past.
But I'm sure an Arduino could do it.
I am using Arduino UNO board. I have 24 analog channel which gives me 0~5v analog out put. Now my problem is I have only 5 analog channel. I wanted to read value from each channel for every 2 min and then switch to other channel. Can anyone suggest me in Hardware how can get analog value ?
I am planning to use 8:1 multiplexer or 16:1 multiplexer . Will it is correct way of doing it. Can you suggest other way of doing it in hardware ?
74HC4051,74HCT4051,ADG708,MD14051B,
IC I am planning to Use.dep[end on so,s1,s2 just switch the channel
As a start, you might need to know that even Arduino Uno also have internal MUX. In my experience of reading multiple analog channel, this is the approach that I take. However by taking this approach, I suggest you to recheck the analog value so adding MUX will not generate any error or bias.
This could be done by comparing the output of measurement with the MUX and output of measurement without the MUX. I used 74HC4051 and it works brilliantly, just make sure not to leave any pin floating. The only disadvantage of this method is that you will need to use some I/O to control the MUX, but if that is not an issue for you, then go ahead.
Any other method could be more complicated. It would require your analog channels to correlate with each other, and you need to find a way method to combine multiple analog channel into a single channel.
e.g: if your aim is to compare two analog value, instead of measuring the value and comparing the value in software, you could make use of op-amp comparison circuit to compare the value for you and take the comparison result instead.
Use the photon-pixel coupling method, it is a new approach in science for sampling an unlimited number of sensors in parallel.
Basically, each sensor output is an LED. If you have 10000 sensors, the output of all of them is inserted in a LED array, a LED matrix as the authors say. After that, the LED array is filmed by a video camera and the images are processed in real time by a computer. A software reads one pixel from each LED from the LED array and converts it to numerical values. So, your LED array will be converted in a matrix (with 10000 elements) filled with numbers that can be processed as you wish in your software. I don't know if I was explicit but you can read their article here: https://www.sciencedirect.com/science/article/pii/S2215016119300901
Note that classic multiplexing is serial, this approach is parallel.
The photon-pixel coupling method is truly ingenious because it solves two main problems in engineering: an unlimited number of sensors and their parallel sampling at video rate frequencies. Just imagine, we can read as many sensors as we wish. What I wander is if we can adapt the photon-pixel coupling to Arduino. I am new in the world of microcontrollers but I know Arduino can support a cam, so it should be possible.
If you are a PhD student then:
P.A. Gagniuc, C. Ionescu-Tirgoviste, R.G. Serban, E. Gagniuc. Photon-pixel coupling: A method for parallel acquisition of electrical signals in scientific investigations. MethodsX, 6:968-979, 2019.
To read more analog channels than inputs you have, an analog multiplexer is a good option. All the ones you suggested will work, but personally, I like the Analog Devices ICs for analog circuits, so I would take the ADG708, but this is just a personal preference.