I have an interest in writing a scheduler/RTOS project in XC8 using an enhanced MCU with access to the hardware stack.
I am trying to figure out how to control the creation of the software stacks so each task's software stack will get a certain range in the general purpose ram.
Conceptually this is all easy to program in ASM but I want to be able to write C programs and have the software stacks for each task be put into the right address space.
There doesn't appear to be an option to create a separate software stack for a certain section of code or even create multiple software stacks - how do I do it?
Thanks
Stack switching is the responsibility of teh scheduler,not teh compiler - so you will not find a compiler option for that. You have to implement that in the scheduler you are intending to write - that is in fact most of what a scheduler does.
In an RTOS, switching context involves storing all the registers relating to one thread of execution and replacing them with those of another. This includes replacing the stack-pointer - that is how you switch stacks between threads. A context switch is completed when the program-counter register is loaded effecting a jump to the new thread's last execution point (with all its registers, including the stack-pointer restored.
The context switch itself necessarily involves at least a small amount of assembler code, but much of it may still be written in C, and tasks themselves may be written in C.. A good description of a simple RTOS scheduler is provided in Jean Labrosse's book on μC/OS-II - freely available in PDF. A PIC18 port of μC/OS-II is described here with download.
I can't seem to find this specific implementation detail, or even a pointer to where in an OS book to find this.
Basically, main thread calls an async task (to be run later) on itself. So... when does it run?
Does it wait for the run loop to finish? Or does it just randomly interrupt the run-loop in the middle of any function?
I understand the registers will be the same (unless separate thread), but not really the instruction pointer and what happens to the stack, if anything does happen.
Thank you
In C# the task is scheduled to be run on the current SynchronizationContext. The context basically has a queue of tasks which it schedules to run on the threads it is associated with, in a GUI app there is only one thread so the task is scheduled to run there.
The GUI thread is not interrupted but it executes the task when it finishes all other tasks preceding it in the queue.
The threads of a process all share the same address space, not the same CPU registers. The thread scheduling is done depends on the programming language and the O/S. Usually there are explicit scheduling points, such as returning from a system call, blocking awaiting I/O completion, or between p-code instructions for interpreted languages. Some O/S implemtations reschedule depending on how long a thread has run for time-based scheduling. Often languages include a function that explicitly offers the CPU to any other thread or process by transferring control to the process or thread scheduler component of the O/S.
The act of switching from one thread or process to another is known as a context switch and is carefully tuned code because this is often done thousands of times per second. This can make the code difficult to follow.
The best explanation of this I've ever seen is http://www.amazon.com/The-Design-UNIX-Operating-System/dp/0132017997 classic.
Non-blocking sends/recvs return immediately in MPI and the operation is completed in the background. The only way I see that happening is that the current process/thread invokes/creates another process/thread and loads an image of the send/recv code into that and itself returns. Then this new process/thread completes this operation and sets a flag somewhere which the Wait/Test returns. Am I correct ?
There are two ways that progress can happen:
In a separate thread. This is usually an option in most MPI implementations (usually at configure/compile time). In this version, as you speculated, the MPI implementation has another thread that runs a separate progress engine. That thread manages all of the MPI messages and sending/receiving data. This way works well if you're not using all of the cores on your machine as it makes progress in the background without adding overhead to your other MPI calls.
Inside other MPI calls. This is the more common way of doing things and is the default for most implementations I believe. In this version, non-blocking calls are started when you initiate the call (MPI_I<something>) and are essentially added to an internal queue. Nothing (probably) happens on that call until you make another call to MPI later that actually does some blocking communication (or waits for the completion of previous non-blocking calls). When you enter that future MPI call, in addition to doing whatever you asked it to do, it will run the progress engine (the same thing that's running in a thread in version #1). Depending on what the MPI call that's supposed to be happening is doing, the progress engine may run for a while or may just run through once. For instance, if you called MPI_WAIT on an MPI_IRECV, you'll stay inside the progress engine until you receive the message that you're waiting for. If you are just doing an MPI_TEST, it might just cycle through the progress engine once and then jump back out.
More exotic methods. As Jeff mentions in his post, there are more exotic methods that depend on the hardware on which you're running. You may have a NIC that will do some magic for you in terms of moving your messages in the background or some other way to speed up your MPI calls. In general, these are very specific to the implementation and hardware on which you're running, so if you want to know more about them, you'll need to be more specific in your question.
All of this is specific to your implementation, but most of them work in some way similar to this.
Are you asking, if a separate thread for message processing is the only solution for non-blocking operations?
If so, the answer is no. I even think, many setups use a different strategy. Usually progress of the message processing is done during all MPI-Calls. I'd recommend you to have a look into this Blog entry by Jeff Squyres.
See the answer by Wesley Bland for a more complete answer.
I'm trying to get around the concept of cooperative multitasking system and exactly how it works in a single threaded application.
My understanding is that this is a "form of multitasking in which multiple tasks execute by voluntarily ceding control to other tasks at programmer-defined points within each task."
So if you have a list of tasks and one task is executing, how do you determine to pass execution to another task? And when you give execution back to a previous task, how do resume from where you were previously?
I find this a bit confusing because I don't understand how this can be achieve without a multithreaded application.
Any advice would be very helpeful :)
Thanks
In your specific scenario where a single process (or thread of execution) uses cooperative multitasking, you can use something like Windows' fibers or POSIX setcontext family of functions. I will use the term fiber here.
Basically when one fiber is finished executing a chunk of work and wants to voluntarily allow other fibers to run (hence the "cooperative" term), it either manually switches to the other fiber's context or more typically it performs some kind of yield() or scheduler() call that jumps into the scheduler's context, then the scheduler finds a new fiber to run and switches to that fiber's context.
What do we mean by context here? Basically the stack and registers. There is nothing magic about the stack, it's just a block of memory the stack pointer happens to point to. There is also nothing magic about the program counter, it just points to the next instruction to execute. Switching contexts simply saves the current registers somewhere, changes the stack pointer to a different chunk of memory, updates the program counter to a different stream of instructions, copies that context's saved registers into the CPU, then does a jump. Bam, you're now executing different instructions with a different stack. Often the context switch code is written in assembly that is invoked in a way that doesn't modify the current stack or it backs out the changes, in either case it leaves no traces on the stack or in registers so when code resumes execution it has no idea anything happened. (Again, the theme: we assume that method calls fiddle with registers, push arguments to the stack, move the stack pointer, etc but that is just the C calling convention. Nothing requires you to maintain a stack at all or to have any particular method call leave any traces of itself on the stack).
Since each stack is separate, you don't have some continuous chain of seemingly random method calls eventually overflowing the stack (which might be the result if you naively tried to implement this scheme using standard C methods that continuously called each other). You could implement this manually with a state machine where each fiber kept a state machine of where it was in its work, periodically returning to the calling dispatcher's method, but why bother when actual fiber/co-routine support is widely available?
Also remember that cooperative multitasking is orthogonal to processes, protected memory, address spaces, etc. Witness Mac OS 9 or Windows 3.x. They supported the idea of separate processes. But when you yielded, the context was changed to the OS context, allowing the OS scheduler to run, which then potentially selected another process to switch to. In theory you could have a full protected virtual memory OS that still used cooperative multitasking. In those systems, if a errant process never yielded, the OS scheduler never ran, so all other processes in the system were frozen. **
The next natural question is what makes something pre-emptive... The answer is that the OS schedules an interrupt timer with the CPU to stop the currently executing task and switch back to the OS scheduler's context regardless of whether the current task cares to release the CPU or not, thus "pre-empting" it.
If the OS uses CPU privilege levels, the (kernel configured) timer is not cancelable by lower level (user mode) code, though in theory if the OS didn't use such protections an errant task could mask off or cancel the interrupt timer and hijack the CPU. There are some other scenarios like IO calls where the scheduler can be invoked outside the timer, and the scheduler may decide no other process has higher priority and return control to the same process without a switch... And in reality most OSes don't do a real context switch here because that's expensive, the scheduler code runs inside the context of whatever process was executing, so it has to be very careful not to step on the stack, to save register states, etc.
** You might ask why not just fire a timer if yield isn't called within a certain period of time. The answer lies in multi-threaded synchronization. In a cooperative system, you don't have to bother taking locks, worry about re-entrance, etc because you only yield when things are in a known good state. If this mythical timer fires, you have now potentially corrupted the state of the program that was interrupted. If programs have to be written to handle this, congrats... You now have a half-assed pre-emptive multitasking system. Might as well just do it right! And if you are changing things anyway, may as well add threads, protected memory, etc. That's pretty much the history of the major OSes right there.
The basic idea behind cooperative multitasking is trust - that each subtask will relinquish control, of its own accord, in a timely fashion, to avoid starving other tasks of processor time. This is why tasks in a cooperative multitasking system need to be tested extremely thoroughly, and in some cases certified for use.
I don't claim to be an expert, but I imagine cooperative tasks could be implemented as state machines, where passing control to the task would cause it to run for the absolute minimal amount of time it needs to make any kind of progress. For example, a file reader might read the next few bytes of a file, a parser might parse the next line of a document, or a sensor controller might take a single reading, before returning control back to a cooperative scheduler, which would check for task completion.
Each task would have to keep its internal state on the heap (at object level), rather than on the stack frame (at function level) like a conventional blocking function or thread.
And unlike conventional multitasking, which relies on a hardware timer to trigger a context switch, cooperative multitasking relies on the code to be written in such a way that each step of each long-running task is guaranteed to finish in an acceptably small amount of time.
The tasks will execute an explicit wait or pause or yield operation which makes the call to the dispatcher. There may be different operations for waiting on IO to complete or explicitly yielding in a heavy computation. In an application task's main loop, it could have a *wait_for_event* call instead of busy polling. This would suspend the task until it has input to process.
There may also be a time-out mechanism for catching runaway tasks, but it is not the primary means of switching (or else it wouldn't be cooperative).
One way to think of cooperative multitasking is to split a task into steps (or states). Each task keeps track of the next step it needs to execute. When it's the task's turn, it executes only that one step and returns. That way, in the main loop of your program you are simply calling each task in order, and because each task only takes up a small amount of time to complete a single step, we end up with a system which allows all of the tasks to share cpu time (ie. cooperate).
Is it possible to hot plug an additional node (host) into a working OpenMPI app? We're talking about production environment where we cannot afford even a 5 second downtime.
There are two scenarios I'm interested in:
We just would like to enhance the computing power by adding one more broadcast listener.
A node died, the master node handles it well and reassigns the task to somebody else. The system administrator comes in, restarts the dead node and plugs it back into the cluster.
Which platform independent MPI implementation would be best for the scenario above? OpenMPI is not a must here.
MPI-2 -- any implementation -- does allow dynamic processes, and in fact adding processes is currently much more feasible than removing processes. You can use MPI_COMM_SPAWN to launch a new process with a given executable, and that returns an intracommunicator that can be used to communicate between the old (original) processes.
The tricks here are -- nothing will automatically detect the new node. You'll have to have some process keeping an eye out for them, SPAWN something on them. If the new nodes will just be listeners to the master node, that's probably the best case, as only the master node really needs to know about it. The invocation to ensure the spawn happens on the new node and not somewhere else will be done through the info argument to spawn, and may be implementation dependant.