I have an interest in writing a scheduler/RTOS project in XC8 using an enhanced MCU with access to the hardware stack.
I am trying to figure out how to control the creation of the software stacks so each task's software stack will get a certain range in the general purpose ram.
Conceptually this is all easy to program in ASM but I want to be able to write C programs and have the software stacks for each task be put into the right address space.
There doesn't appear to be an option to create a separate software stack for a certain section of code or even create multiple software stacks - how do I do it?
Thanks
Stack switching is the responsibility of teh scheduler,not teh compiler - so you will not find a compiler option for that. You have to implement that in the scheduler you are intending to write - that is in fact most of what a scheduler does.
In an RTOS, switching context involves storing all the registers relating to one thread of execution and replacing them with those of another. This includes replacing the stack-pointer - that is how you switch stacks between threads. A context switch is completed when the program-counter register is loaded effecting a jump to the new thread's last execution point (with all its registers, including the stack-pointer restored.
The context switch itself necessarily involves at least a small amount of assembler code, but much of it may still be written in C, and tasks themselves may be written in C.. A good description of a simple RTOS scheduler is provided in Jean Labrosse's book on μC/OS-II - freely available in PDF. A PIC18 port of μC/OS-II is described here with download.
Related
I have a Go program which cannot be rewritten in Common Lisp for efficiency reasons. How can I run it via Common Lisp?
Options so far:
1. CFFI
Using the foreign function interface seems to me like the "correct" way to do this. However, the research I did lead directly to a dead end. If this is the winner, what resources are there to learn about how to interface with Go?
2. Sockets
Leaving the Go program running all the time while listening on a port would work. If this is the best way, I'll continue trying to make it work.
3. Execute System Command
This seems all kinds of wrong.
4. Unknown
Or is there an awesome way I haven't thought of yet?
It depends on what you want to do, but 1-3 are all viable options
1. CFFI
To get this to work you will need to use FFI on both the go and lisp side.
You will need to extern the appropriate function from Go as C functions, and then call them using cffi from lisp. See https://golang.org/cmd/cgo/#hdr-C_references_to_Go on how to extern function in go. In this case you would create a dynamically linkable library (dll or so file) rather than an executable file.
2. Sockets (IPC)
The second option is to run your go program as a daemon and use some form of IPC (such as sockets) to communicate between lisp and go. This works well if your program is long running, or if it makes sense to have a server and one or more clients (the server could just as easily be the lisp code as the go code). Sockets in particular are also more flexible, you could write components in other languages or change languges for one component without having to change the others as long as you maintain the same protocol. Also, you could potentially run the components on seperate hardware. However, using sockets may hurt performance.
There are other IPC methods available, such as FIFO files (named pipes), SHM, and message queues, but they are more system dependent than sockets.
3. System command (subprocess)
The third way is to start a sub-process. This is a viable option, but it has some caveats. First of all, the behavior of starting a sub process is dependent both on the lisp implementation and the operating system. UIOP smooths out a lot of the details for implementation differences, but some are too great to overcome. In particular, depending on the implementation you may or may not be able to run a subprocess in parallel. If not you will have to run a seperate command every time you want to communicate with go, which means waiting for the process to start up every time you need it. You also may, or may not be able to send input to the subprocess after starting it.
Another option is to run a command to start a go process, and then communicate with it using sockets or some other IPC, and then running a command to stop the process before closing the lisp program.
Personally, I think that using sockets is the most attractive option, but depending on your needs, on of the other options might be better suited.
CFFI is to use C from Common Lisp. It's an easy way to get new features without too much hassle as the libraries out there usually are written in C or have a C interface. If you can make a C library from your Go source then you can do this and use the foreign feature from CL.
Sockets (or other two way communication bus) are good if the Go program is a service that is supposed to provide something. Eg. an application server to serve http requests. Usually if you only need to use the go program once each run of the CL program this isn't the way to go.
Subprocess is best if you can run your application with arguments and get a result that is used in Common Lisp. It's not good if you are going to use the Go program many times as it will have overhead (in which the sockets thing would be best)
Awesome way to do this is to make the whole thing in Common Lisp. If you choose a implementation that has a good compiler and write it well you might get away with having the application as a CL image. If you need to speed up things you can focus on the slow parts and optimize them og you can use CFFI by writing the optimizations in C. There is even a Inline C for SBCL where you can just write C where you want to optimize in CL and you don't need to write the optimizations in their own file and compile and link separately.
Is there an MPI implementation that allows nodes to be dynamically added/removed at runtime? Do any recover from complete hardware failure of a node, allowing the node to be repaired and relaunched without restarting the program?
Is there an MPI implementation that allows nodes to be dynamically added/removed at runtime?
This is actually two questions. Nodes usually can be dynamically added at runtime using calls like MPI_Comm_spawn. As #Hristo pointed out in the comments, you should set the correct info key in Open MPI. It may also be possible in other implementations. As for removing nodes, that's a big area of research at the moment. Most MPI implementations currently have varying levels of success surviving a total node failure. In the current releases of Open MPI, I don't believe there is any support for that sort of failure [citation needed], though there is work to improve that ongoing. In the current version of MPICH, you can pass the flag -disable-auto-cleanup to mpiexec and it will not automatically clean up your application after a process/node failure. However, you'll still have to modify your MPI application to handle this situation. The various derivatives of MPICH (Intel MPI, Cray MPI, IBM MPI, MVAPICH, etc.) all don't support this feature AFAIK. There are other research implementations that are also available to extend the support of the MPI Standard. User Level Failure Mitigation is currently being considered by the standardization body as a way of letting the user handle process failures. There is a research implementation based on Open MPI available at the website linked, and an experimental prototype will also be in the next version of MPICH (3.2).
Do any recover from complete hardware failure of a node, allowing the node to be repaired and relaunched without restarting the program?
This is essentially the same process as above. You would need to use the APIs to remove a process and then somehow find out that it's available and add it back using spawn. These calls have to be made from inside the application though, not externally.
We got a high traffic website which generates a lot of I/O. Within 10 minutes it has been reading over 10 gb of data (w3wp in question seen in task manager). For memory and application hangs I have been using WinDbg with success. But I don't know how I can find the object(s) / method(s) within a process which are responsible for the highest I/O.
Is this even possible?
Edit
The question is: Is there a way to profile I/O operations in a .NET assembly, say: list of threads sorted by highest disk I/O (or something similar that would help me where to look)
ANTS Performance Profiler
I have used this tool to great success - dealing with finding the specific instructions which are causing ~512GB of memory on a high-volume web farm getting chewed up within 5-10 minutes. Sounds like a very similar situation as yours.
Now, to be realistic - it's not going to magically solve your problem. It still requires a lot of setup, thorough analysis and detective work. But this tool definitely took the problem from "practically unsolvable" to "solvable within days".
Update:
As I mentioned in the comments (and Ben Emmett echoed), we can use ANTS to monitor memory, file system handles - pretty much any resource consumption and drill down the call stack to see the effects of specific routines.
I came up with this tool AppDynamics Lite which displays your application calls costs and performance in a visual way. It might help you to find out which functions are making the most costy IO operations.
Quoting;
Understand the health of your CLR with key metrics like response time, throughput, exception rate, and garbage collection time as well as key system resource like CPU, memory and disk I/O.
Worth giving a shot as it is trial/free for 30 days. Hope it helps.
Ps: I'm not affiliated with AppDynamics in any way.
You can use the (free) Windows Performance Toolkit from Windows 8 which does run also on Windows Vista and later. There you can turn on system wide profiling to see what was going on in all processes at once. No instrumentation necessary. Only one reboot is required to set an arcane registry key which is done by WPRUI.exe automatically.
With XPerf you could enable IO Init stack walking so that a call stack is taken for every IO which is started. The only issue is that the stacks will be broken for 64 bit processes which means that you will see only the first method above the BCL methods of your code because there is a Windows 7 bug in the stackwalking capabilities of the OS.
A workaround is to Ngen your assemblies or move to Server 2012 or switch to x86 for profiling to see deeper call stacks.
You will see all file IO and CPU activity even without any call stacks and the file names along how long the hard disc was used. That should give you good information which part of your app is causing the disc IO. From the partial call stacks you should be able to pinpoint your issue even without full stacks.
The tool will give you much more insight than any commercially available profiler at the expense that you need to learn how to use it. Since the call stacks do not end at your code or in user mode but in the kernel you can also determine if e.g. the virus scanner is causing significant IO delays. But you need to know how your processor does work. This toolset was originally aimed at kernel devs which explains why you see so many useless columns.
In the picture below you see file IO and CPU consumption stacked. When you select your high IO file in the disc IO graph it will highlight in the CPU consumption all related call stacks which were taken at the same time while the IO was active. This way you can diretly navigate from the IO to your potentially blocked threads.
I have an interactive program with a high start-up cost. After start-up, I'd like to fork the process into separate concurrent sessions. Ideally each separate session would become a GNU screen window but being able to individually telnet/ssh to each session would be fine too.
It shouldn't be too hard to write this from scratch but it seems like something that should have been done/considered before and maybe there are reasons why this is a bad idea...
I know that an alternative approach is to use shared memory for the data that's expensive to initialize. The reason I'm reluctant to go down that path is that the shared data uses C++ data structures with pointers, which makes it hard to mmap it into an unrelated process.
This is what any database does - the startup is phenomenally expensive, but the db provides several different means of connecting - example Oracle's BEQ protocol.
Telnet has issues, consider ssh. Either way, consider a daemon that answers requests for connect on a port (you would use AF_UNIX sockets I guess), then creates a separate session.
Stevens Advanced Programming in the UNIX Envrionment or Rochkind's Advanced UNIX Programming have discussions and complete examples. Since my Stevens book seems to have gone on extensive holiday, see Rochkind 4.3 and 4.10.
And no, there is no pending doom for using this approach.
Over the years I've worked on a number of microcontroller-based projects; mostly with Microchip's PICs. I've used various microcontroller simulators, and while they can be very helpful at times, I often find myself frustrated. In real life microcontrollers never exist alone and the firmware's behavior is dependent on the environment. However, none of the sims I've used provide decent support for anything outside the microcontroller.
My first thought was to model the entire board in Verilog. But, I'd rather not create an entire CPU model, and I haven't had much luck finding existing models for the chips I use. Regardless, I really don't need, or want, to simulate the proc at that level of detail, and I'd like to retain the debugging facilities provided by a regular processor sim.
It seems to me that the ideal solution would be a hybrid simulator that interfaces a traditional processor simulator with a Verilog model.
Does such a thing exist?
I've used the Altera Nios II processor embedded on a FPGA. Altera provides a toolchain for simulating the CPU (with its software) together with your custom logic in a simulator. I suppose that a similar setup can be achieved by downloading a VHDL/Verilog core of your CPU (Did you try opencores ? They have lots of stuff there).
But keep in mind that it is going to be mind-bogglingly slow, so don't expect to simulate whole complex processes this way. The best you can hope for is simulating fine software-hardware interaction points to debug problems. If you need a deeper simulation, consider running it on a FPGA with built-in monitoring code.
For the "simulate the whole board" approach,
The Free Model Foundry has a large number of models, some in VHDL others in Verilog, that are available now.. but you'll need to pay to have new models created. These are very helpful in being sure the board is built correctly.
But I think the more common approach when debugging your PIC is to just build a board, then work on the firmware. In the chip world, (where the firmware is running on a microprocessor in a chip that hasn't gone to fab yet) people often resort to very expensive systems (or renting time on them) that allow compiling part of the design into an emulator while the rest of the design runs in the normal simulator environment. Without the barrier of an expensive mask set for the chip, the cost is just not justifiable for a Circuit board. Although I've heard of some creative applications of Simulink (Mathworks) with FPGA's, but my recollection is that one either ran the system on the computer, or programmed the device and ran the same thing in realtime.
I believe both Cadence (ask about Palladium) and Mentor Graphics have that integrated solution if you have the money to spend on it.
What I have done recently is create an interface between the simulation environment and host system. Different hdl simulators have different interfaces, and getting the simulator NOT think in batch mode, the traditional simulation model, instead run for ever like a real design is half of the problem.
Then from the host using C (or whatever) you can create abstractions that may or may not allow you to write your application software for whatever target (depending on what language and compiler capabilities you have). For example you can make a generic poke and peek function and on the final target have those actually poke and peek memory or I/O, but for simulation through the abstraction you talk to a testbench in the simulation that simulates the same memory cycle.
I went one step further and used (Berkeley) sockets between the host and test bench so that the simulation can keep running while the host applications stop and start. Not unlike having a real processor with an OS that you are starting applications and running them to completion and starting another. At least for test applications, for delivery you probably only have one app.
By creating these abstraction layers I can write real applications that will be used on the target when it is built. Along the way you can use software simulation of the logic initially, then if you like build an fpga with an abstraction interface (throw away logic) say a uart for example. Replace the shim between the applications abstraction layer and the simulator with a uart interface, or whatever. Then when you marry the processor and logic in the same chip or on the same board, replace the abstraction layer again with direct calls to whatever interfaces they have always though they were talking to. If something breaks and you have retained the abstraction layer you can take the application back to the simulation model and have access to all of your logic internals.
Specifically this time around I am using a hdl language cyclicity cdl which is on sourceforge, the documentation needs some help but the examples may get you going, and it produces synthesizable verilog, so you get an extra win there. I threw out all the scripting batch stuff other than the bare minimum needed to connect and start a C simulation model. So my test bench is in C (well C++ technically) the sockets layer was done there. The output can be .vcd files which gtkwave uses. Basically you can do the bulk of your HDL design using open source software with no licenses, etc. By adding one or two lines of code to the CDL simulation part I was able to have it run as an infinite loop, which I can say works quite well, there doesnt appear to be any memory leaks, etc.
both modelsim and cadence have standardized ways of connecting host C programs to the simulation world and from there you can use an IPC to get to host applications talking to an abstraction layer api.
this is probably way overkill for a pic, I have given up pics a while ago for the faster and C friendly arm based micros anyway. There is/was an open core pic that you could simply incorporate into your simulation, even though that is not what you are trying to do here.
Not that I've seen. Your best bet is to properly define the interfaces and behavior between the uC and FPGA and then define a series of test waveforms that can be applied using an automated tester. You would have to make the automated tester (or perhaps a logic analyzer may have some such functionality) out of an FPGA or uC (apply waveform, watch interrupts, breakpoints, etc). If you really want I know that Opencores.org has PIC and AVR-like 8-bit uC cores defined as VHDL, so you could implement your entire project on the FPGA and then just debug that.
Generally there isn't need to model the CPU at the RTL level. Since you don't really care about what it does bit by bit; you generally care about what it does, e.g. register values, memories and bus access.
The simplest is call at Bus Functional Model. This just generates the read and writes that the CPU does, often based on a text file. These are available for some CPUs and many popular buses (e.g. PCI, PCIe). THese simulate super fast.
The next step up is a functional cycle-accurate model. Those simulate fast. They are often encrypted.
Last is a full RTL model. Those usually are only available if you are working closely with the CPU vendor, e.g. using their core in your ASIC. Typically these are encrypted, unless you are a huge company.
Memory models are typically cycle-accurate (e.g. Micron).
My workmates from the hardware department use FPGA simulation software quite often to find timing-bugs and trace down strange behaviours.
Simulating one or two milliseconds can take several hours, so using the simulator for anything but very small things is not feasable.
You may want to have a look at SystemC though. http://en.wikipedia.org/wiki/SystemC