If a program is written in a single threaded language, does that mean that when it is executed only a single process exists for it at a time (no concurrent processes)?
A process is just a separate memory space. A thread is just a unit of execution on a process. A process can have multiple threads. A thread cannot coexist between multiple processes.
When you run a single-threaded program (assuming the language runtime does not introduce any other threads) there exists only one thread in the process. That doesn't mean that there exists only one process for that program because multiple instances of the same program might be running.
Related
Is it possible that one process is running in kernel mode and another in user mode at the same time?
I know, it's not a coding question but please guide me if someone knows answer.
For two processes to actually be running at the same time, you must have multiple CPUs. And indeed, when you have multiple CPUs, what runs on the different CPUs is very loosly coupled and you can definitely have one process running user code on one CPU, while another process runs kernel code (e.g., doing some work inside a system call) on another CPU.
If you are asking about just one CPU, in that case you can't have two running processes at the same time. But what you can have is two runnable processes, which mean two processes which are both ready to run but since there is just one CPU, only one of the can actually run. One of the runnable processes might be in user mode - e.g., consider a long-running tight loop that was preempted after its time quota was over. Another runnable process might be in kernel mode - e.g., consider a process that did a read() system call from disk, the kernel sent the read request to the disk, but the read request completed so now the process is ready to run again in kernel mode and complete the read() call.
Yes, it is possible. Even multiple processes can be in the kernel mode at the same time.
Just that a single process cannot be in both the modes at the same time.
correct me but i suppose there is no any processes in kernel mode , there are only threads.
I can't seem to find this specific implementation detail, or even a pointer to where in an OS book to find this.
Basically, main thread calls an async task (to be run later) on itself. So... when does it run?
Does it wait for the run loop to finish? Or does it just randomly interrupt the run-loop in the middle of any function?
I understand the registers will be the same (unless separate thread), but not really the instruction pointer and what happens to the stack, if anything does happen.
Thank you
In C# the task is scheduled to be run on the current SynchronizationContext. The context basically has a queue of tasks which it schedules to run on the threads it is associated with, in a GUI app there is only one thread so the task is scheduled to run there.
The GUI thread is not interrupted but it executes the task when it finishes all other tasks preceding it in the queue.
The threads of a process all share the same address space, not the same CPU registers. The thread scheduling is done depends on the programming language and the O/S. Usually there are explicit scheduling points, such as returning from a system call, blocking awaiting I/O completion, or between p-code instructions for interpreted languages. Some O/S implemtations reschedule depending on how long a thread has run for time-based scheduling. Often languages include a function that explicitly offers the CPU to any other thread or process by transferring control to the process or thread scheduler component of the O/S.
The act of switching from one thread or process to another is known as a context switch and is carefully tuned code because this is often done thousands of times per second. This can make the code difficult to follow.
The best explanation of this I've ever seen is http://www.amazon.com/The-Design-UNIX-Operating-System/dp/0132017997 classic.
I have two programs master and slave. My master does data decomposition and slaves do computation on the part of decomposed data. MPI scaterv is implemented for distribution of work.I execute my master program first then it dynamically spawns child or slave processes and slave executes different code ie.computation. Now again master has to collect results from slaves and executes next level of decomposition. how do I do that using MPI? I actually wanted to execute my master and slave code alternately.. How can I implement this?
Thank you in advance..
MPI-2 (if I remember correctly) introduced mechanisms for dynamic process management, you might care to search for mpi_comm_spawn to start learning about those mechanisms. So it is certainly possible to write an MPI program which alternates between one process running the master task and multiple processes running the worker tasks (the term slave is deprecated). It's even possible to design your computation so that one program runs the master task and another program runs the (multiple) worker tasks and to use MPI for passing messages between the two.
BUT (that's a big but) I don't think that many resource managers (either the humans who manage parallel computer systems or the operating system and systems software such as job managers) support such dynamic process management. Imagine the complexities of scheduling, and managing, two or more programs with the basic design that you propose. Just as program A tries to fire up 2^10 worker processes so too does program B, and program C, while program D tries to drop 2^8 worker processes; all this on a cluster with only 2^10 processors (or cores). It's probably not too difficult to construct scenarios where the throughput of jobs on the cluster falls towards zero as multiple jobs contend for scarce resources.
If your platform supports dynamic process management, go right ahead. In the far more likely case that your platform does not you have at least two choices, which one you choose depends on the ratio of master:worker time and probably other factors too. You could:
Do what most of us have always done and continue to do and request a total number of processors for the entire job, leaving all but one of them idle during the master-only phases. Wasteful perhaps but easy for the resource managers to cope with. Relatively easy to program too.
If the master does a lot of work between worker phases you could modify your program so that the master and worker are separate programs. First have the master execute on one process and, as it finishes, submit a request to the job management system to initiate the first phase of the worker computation. Have that, in turn, initiate the execution of the next master phase, and so on and so on.
I wanna know why a context switch is slow compared to asynchronous operations on the same thread.
Why is better to run N threads (with N equals to the number of cores), each one processing M clients assynchronously, instead of running M threads? I've told the reason is the context switch overhead, but I can't find how slow are context switchs.
Just to clarify I will assume that when you say “instead of running M threads” you mean N*M threads (if you run M threads, each one will need to process N clients in order to match the same number of total clients and this will be a similar case).
So the difference between N threads running in N cores, each one processing M clients, and N*M threads running in the same number of cores it is that in the first case you won’t have to create new threads and, as you said, you won’t have context switching. This is an advantage because the work needed to create OS threads is heavy; it needs to create a different process space, a new stack, etc. Besides, if you have more threads the OS scheduler will be stopping and activating the running processes, which it is also time-consuming. Every time the scheduler change the process assigned to a core it will probably also need to cache the context of this process, adding a lot of cache-misses and consequently more time.
On the other hand, if you have a fixed number of thread, equals to the number of cores (sometimes even N-1 is suggested) you can manage the “tasks” or clients in a user-level scheduler which may incur in a few more computations of your program but avoid a lot of OS processes and memory management, making the overall execution faster. Some current parallel APIs such as .Net Task Parallel Library (TPL), OpenMP, Intel’s Threading Building Blocks, or Cilk embody this model of parallelism called dynamic multithreading.
Is there a way to internalize the creation of MPI processes? Instead of specifying the number of processes in the commandline "mpiexec -np 2 ./[PROG]"; I would like the number of processes be specified internally.
Cheers
Yes. You're looking for MPI_Spawn() from MPI-2, which launches a (possibly different) program with a number of processes that can be specified at runtime, and creates a new communicatator which you can use in place of MPI_COMM_WORLD to communicate amongst both the original and the new processes.