Over the years I've worked on a number of microcontroller-based projects; mostly with Microchip's PICs. I've used various microcontroller simulators, and while they can be very helpful at times, I often find myself frustrated. In real life microcontrollers never exist alone and the firmware's behavior is dependent on the environment. However, none of the sims I've used provide decent support for anything outside the microcontroller.
My first thought was to model the entire board in Verilog. But, I'd rather not create an entire CPU model, and I haven't had much luck finding existing models for the chips I use. Regardless, I really don't need, or want, to simulate the proc at that level of detail, and I'd like to retain the debugging facilities provided by a regular processor sim.
It seems to me that the ideal solution would be a hybrid simulator that interfaces a traditional processor simulator with a Verilog model.
Does such a thing exist?
I've used the Altera Nios II processor embedded on a FPGA. Altera provides a toolchain for simulating the CPU (with its software) together with your custom logic in a simulator. I suppose that a similar setup can be achieved by downloading a VHDL/Verilog core of your CPU (Did you try opencores ? They have lots of stuff there).
But keep in mind that it is going to be mind-bogglingly slow, so don't expect to simulate whole complex processes this way. The best you can hope for is simulating fine software-hardware interaction points to debug problems. If you need a deeper simulation, consider running it on a FPGA with built-in monitoring code.
For the "simulate the whole board" approach,
The Free Model Foundry has a large number of models, some in VHDL others in Verilog, that are available now.. but you'll need to pay to have new models created. These are very helpful in being sure the board is built correctly.
But I think the more common approach when debugging your PIC is to just build a board, then work on the firmware. In the chip world, (where the firmware is running on a microprocessor in a chip that hasn't gone to fab yet) people often resort to very expensive systems (or renting time on them) that allow compiling part of the design into an emulator while the rest of the design runs in the normal simulator environment. Without the barrier of an expensive mask set for the chip, the cost is just not justifiable for a Circuit board. Although I've heard of some creative applications of Simulink (Mathworks) with FPGA's, but my recollection is that one either ran the system on the computer, or programmed the device and ran the same thing in realtime.
I believe both Cadence (ask about Palladium) and Mentor Graphics have that integrated solution if you have the money to spend on it.
What I have done recently is create an interface between the simulation environment and host system. Different hdl simulators have different interfaces, and getting the simulator NOT think in batch mode, the traditional simulation model, instead run for ever like a real design is half of the problem.
Then from the host using C (or whatever) you can create abstractions that may or may not allow you to write your application software for whatever target (depending on what language and compiler capabilities you have). For example you can make a generic poke and peek function and on the final target have those actually poke and peek memory or I/O, but for simulation through the abstraction you talk to a testbench in the simulation that simulates the same memory cycle.
I went one step further and used (Berkeley) sockets between the host and test bench so that the simulation can keep running while the host applications stop and start. Not unlike having a real processor with an OS that you are starting applications and running them to completion and starting another. At least for test applications, for delivery you probably only have one app.
By creating these abstraction layers I can write real applications that will be used on the target when it is built. Along the way you can use software simulation of the logic initially, then if you like build an fpga with an abstraction interface (throw away logic) say a uart for example. Replace the shim between the applications abstraction layer and the simulator with a uart interface, or whatever. Then when you marry the processor and logic in the same chip or on the same board, replace the abstraction layer again with direct calls to whatever interfaces they have always though they were talking to. If something breaks and you have retained the abstraction layer you can take the application back to the simulation model and have access to all of your logic internals.
Specifically this time around I am using a hdl language cyclicity cdl which is on sourceforge, the documentation needs some help but the examples may get you going, and it produces synthesizable verilog, so you get an extra win there. I threw out all the scripting batch stuff other than the bare minimum needed to connect and start a C simulation model. So my test bench is in C (well C++ technically) the sockets layer was done there. The output can be .vcd files which gtkwave uses. Basically you can do the bulk of your HDL design using open source software with no licenses, etc. By adding one or two lines of code to the CDL simulation part I was able to have it run as an infinite loop, which I can say works quite well, there doesnt appear to be any memory leaks, etc.
both modelsim and cadence have standardized ways of connecting host C programs to the simulation world and from there you can use an IPC to get to host applications talking to an abstraction layer api.
this is probably way overkill for a pic, I have given up pics a while ago for the faster and C friendly arm based micros anyway. There is/was an open core pic that you could simply incorporate into your simulation, even though that is not what you are trying to do here.
Not that I've seen. Your best bet is to properly define the interfaces and behavior between the uC and FPGA and then define a series of test waveforms that can be applied using an automated tester. You would have to make the automated tester (or perhaps a logic analyzer may have some such functionality) out of an FPGA or uC (apply waveform, watch interrupts, breakpoints, etc). If you really want I know that Opencores.org has PIC and AVR-like 8-bit uC cores defined as VHDL, so you could implement your entire project on the FPGA and then just debug that.
Generally there isn't need to model the CPU at the RTL level. Since you don't really care about what it does bit by bit; you generally care about what it does, e.g. register values, memories and bus access.
The simplest is call at Bus Functional Model. This just generates the read and writes that the CPU does, often based on a text file. These are available for some CPUs and many popular buses (e.g. PCI, PCIe). THese simulate super fast.
The next step up is a functional cycle-accurate model. Those simulate fast. They are often encrypted.
Last is a full RTL model. Those usually are only available if you are working closely with the CPU vendor, e.g. using their core in your ASIC. Typically these are encrypted, unless you are a huge company.
Memory models are typically cycle-accurate (e.g. Micron).
My workmates from the hardware department use FPGA simulation software quite often to find timing-bugs and trace down strange behaviours.
Simulating one or two milliseconds can take several hours, so using the simulator for anything but very small things is not feasable.
You may want to have a look at SystemC though. http://en.wikipedia.org/wiki/SystemC
Related
I studied many subjects from physics to logic gates to processor.
I also studied computer architecture, compilers, Assembly x86, operating systems, GPU, ....
All the subjects mentioned above, for some reasons didn't cover what is going "after an executable file being produced by compilers" downward the processor.
Please could you provide me with resources explain these things. because the way I am thinking drive me crazy if I didn't understand why things works the way they are working.
Like I want to understand for-example; why UNIX files start with 'elf'? if you tell me it 's convention. then how the computer as machine understand that a file start by 'elf' being passed to it?
It is a job of operating system. Then how the computer understand the code of operating system ?. I know that processor will read it represented in Binary.
But how really the computer understand the binary? don't tell via transistors and logic gates. What I need to understand how the binary is being signaled to the computer hardware ? how to hardware is really being implemented to understand the binary?
Please any resources about this stage I mention above share it with me ?
Binary is really just electrical signals and small charges stored in memory circuits. The binary data (ex:0101) is just a number represented with 2 symbols instead of 10 (like in base 10 or decimal). What matters, is the way you interpret those numbers to give them meaning.
For example, in processor architectures like x64, the developers of the architecture will create an instruction set that has a certain amount of instructions with a very specific format. The binary data thus becomes interpreted to be those instructions. Since the CPU is manufactured in its logic circuits to understand those instructions, then it can execute them by doing what they should according to the architecture.
The CPU is also configured to start executing at a specific address. The motherboard manufacturer puts machine code in binary at that address to kickstart the CPU. This is called the firmware. The OS developer will put its machine code on the hard-drive at a conventional position and in a conventional format so that the firmware can find and execute it.
Even in electrical terms there needs to be a conventional interpretation of signals. A certain chip on your motherboard might interpret 3V as a high signal and another might be 5V. What matters is that interconnected chips understand each other's interface. For example, an x64 processor might be connected to modern DDR4 RAM. The CPU knows that if it applies a signal (let's say 5V square wave 2600MHz) to certain pins of that chip then each oscillation will write or read data in the memory cells that are selected. It knows that because it expects the RAM module to follow the DDR4 standard. If the RAM module doesn't follow the standard then the CPU will not be able to interact with it.
The program written in C and compiled on some other IDE/computer (or cross-compiling) and then loaded as binary data into the flash memory of the controller.
What am i not understanding in Bare Metal / No RTOS
Which program/code take care of loading from Flash to RAM?
Is the RAM in microcontroller have intelligence/program to understand binary or at time of compile the intelligence is added to the binary file by compiler?
Ideally your program runs in flash not ram. Many mcus you can, it would be an architecture limit primarily if running from ram is not supported. In a pinch you can run your code in ram if you need a trampoline to reprogram the flash as in downloading new firmware in the field (for a chip with only one flash bank that can't run and be erased/modified at the same time), or for performance, but if you need ram for performance then perhaps you need to rethink your design. small sections sure, but if the whole app has to be in ram for reasons other than development, you need to re-think your system design.
You can easily wrap your program with a small copy to ram bit of code, so that the mcu boots up the copy and jump program and then the main application runs in ram. that is your choice. somewhat trivial just a few lines of code. it is chip/architecture dependent on whether you can handle interrupts in that situation or how you need to design it (more than just a copy and jump for example, might need handlers in flash that hop over to ram too).
There is no magic here, the mcu processor is no different than others you need some non-volatile way to get the program in there. Like most others cpus your processor boots from a rom/flash, then as desired it works toward the final application be it an operating system or not. for an mcu the typical approach is to boot right into the application, run the application in flash for read only items (.text and .rodata) and the read-write in ram (.data, .bss) which is handled by knowing how to use your toolchain, which is a critical part of bare-metal success.
CPUs generally don't care about flash, ram, peripherals, they are just addresses, the cpu is very very dumb. You the programmer are smart you lay the tracks down for the cpu to follow, the instructions have to follow the rules and guide the processor. The processor starts in a well known way at a well known address or vector table, from there it is all on you to keep the processor on track by working within the address space where there are resources, flash, ram and peripherals. The processor may have rules on the address space it can fetch/execute from, or not, depends on the implementation. For implementations where the executable address space has both flash and ram then yes you can simply place code in ram and execute it.
Running code in ram on an mcu is the exception not the rule.
Commonly a microcontroller does not load the (single) program into RAM. Instead it is run "in-place" in the (flash or any other non-volatile) memory. The program is built so that the memory at the (fixed) start address contains the startup code of the program.
Having said that you might wonder how (static) variables are initialized with zero and non-zero values. That is done by the startup code linked in when the program is built.
There is no need to add any "intelligence", assuming you mean something like a byte-code interpreter to execute the binary commands. The CPU of the microcontroller executes the machine code directly. And your compiler generates exactly the machine code.
I will be asking a very subjective question, but it is important as I am looking to recover from failure to effectively use BlueZ programatically.
Basically I envision an IoT edge device that runs on a miniature computer (Ex: Raspberry pi or Intel Compute Stick). The device would then run AlpineLinux OS and interact with Cloud.
Since it is IoT environment, it is needless to mention the importance of Bluetooth BLE over ISM band. Hence the central importance of being able to customize and work with BlueZ.
I am looking to do several things with BlueZ BLE including but not limited to
Advertising
Pairing
Characteristic
Broadcast
Secure transport of data etc...
Since I will be needing full control over data, for data-processing and interacting with cloud (Edge AI or Data-science on Cloud) I am looking at three ways of using BlueZ:
Make DBus API calls to BlueZ Methods.
Modify BlueZ codebase and make install a custom bin.
(So that callback handlers can be registered and wealth of other bluez
methods can be invoked)
Invoke BlueZ using command line utils like hcitool/bluetoothctl inside a program using system() calls.
No 1 is where I have failed. It is exorbitant amount of effort to construct and export DBus objects and then to invoke BlueZ methods. Plus there is no guarantee that you will be able to take care of all BLE issues.
No 2 looks very promising and I want to fully explore how feasible it is to modify the BlueZ code to my needs.
No 3 is the least desirable option, but I want to have it as a fallback option nevertheless.
Given my problem statement, what is the most viable strategy forward? I am asking this aloud so that I do not make more missteps and cost myself time and efforts.
Your best strategy is to start with the second way (which you already found promising) as this is a viable solution and many developers go about this method in order to create their BlueZ programs. Here is what I would do:-
Write all the functionality of the system in some sort of flowchart or state machine. This helps you visualise your whole system and what needs to be done to reach your end goal.
Try to perform all the above functionality manually using bluetoothctl and btmgmt. This includes advertising, pairing, etc. I recommend steering away from legacy commands such as hcitool and hciconfig as these have been deprecated and have a very different code structure.
When stumbling upon something that is not the default in bluetoothctl/btmgmt or you want to tweak the functionality, update the source to do so.
Finally, once you manually get the system to perform the functionality that you need (it doesn't have to be all, it can just be a subset of the functions), you can move to automating the whole process. This involves modifying the source for bluetoothctl/btmgmt commands so that instead of manual intervention, everything would be event-driven.
This is a bonus, but if you can create automated tests using python or some other scripting language, then this would ensure that your system is robust and that previous functionality doesn't break when adding new ones.
By the end of this process, you'll have a much better understanding of the internals of bluetoothctl/btmgmt and D-BUS APIs that you might be able to completely detach your code from the original bluetoothctl/btmgmt or create the program from scratch.
You probably already know this, but when modifying the tools, this is the starting point for the source code:-
bluetoothctl - client/main.c
btmgmt - tools/btmgmt.c
For more references on using bluetoothctl commands and btmgmt, please see the links below:-
BlueZ D-Bus C or C++ Sample
Bluetoothctl set passkey
https://stackoverflow.com/a/51876272/2215147
Bluez Programming
Linux command line howto accept pairing for bluetooth device without pin
https://stackoverflow.com/a/52982329/2215147
Bluetooth Low Energy in C - using Bluez to create a GATT server
I hope this helps.
I have an interest in writing a scheduler/RTOS project in XC8 using an enhanced MCU with access to the hardware stack.
I am trying to figure out how to control the creation of the software stacks so each task's software stack will get a certain range in the general purpose ram.
Conceptually this is all easy to program in ASM but I want to be able to write C programs and have the software stacks for each task be put into the right address space.
There doesn't appear to be an option to create a separate software stack for a certain section of code or even create multiple software stacks - how do I do it?
Thanks
Stack switching is the responsibility of teh scheduler,not teh compiler - so you will not find a compiler option for that. You have to implement that in the scheduler you are intending to write - that is in fact most of what a scheduler does.
In an RTOS, switching context involves storing all the registers relating to one thread of execution and replacing them with those of another. This includes replacing the stack-pointer - that is how you switch stacks between threads. A context switch is completed when the program-counter register is loaded effecting a jump to the new thread's last execution point (with all its registers, including the stack-pointer restored.
The context switch itself necessarily involves at least a small amount of assembler code, but much of it may still be written in C, and tasks themselves may be written in C.. A good description of a simple RTOS scheduler is provided in Jean Labrosse's book on μC/OS-II - freely available in PDF. A PIC18 port of μC/OS-II is described here with download.
Long-Winded Background
I'm working on parallelising some code for cardiac electrophysiology simulations. Since users can specify their own simulations using an in-built scripting language, I have no way of knowing how to manage the trade-off of communication vs. computation. To combat this, I'm making a sort-of runtime profiler, which will decide how to handle the domain decomposition once it's seen the simulation to be run and the hardware environment that it has to work with.
My question is this:
How is MPI I/O implemented behind the scenes? Is each process actually writing to a single file on some other node, or is each process writing to some sparse file, which will get spliced back together when the file is closed?
Knowing this will help me decide whether to consider I/O operations as communication or computation, and adjust the balance accordingly…
Thanks in advance for any insight you can offer.
Ross
The mechanism for I/O is implementation dependent. In addition, there is not a single style of I/O. Some I/O is cached by the remote ranks and collected by the mpirun process at the end of the run. Some I/O is written to local scratch space as required. Some I/O is written to a NAS/SAN style high performance shared file system.
Some MPI's use 3rd party libraries to support I/O to parallel file systems, and those details may be proprietary. Some file systems are local discs, others are SAN over fiber or InfinBand.
How are you planning to actually measure the time spent in I/O? Are you planning to use the pMPI interface to intercept all the calls into the library?