How are stdin and stdout made unique to the process? - unix

Stdin and stdout are single files that are shared by multiple processes to take in input from the users. So how does the OS make sure that only the input give to a particular program is visible in the stdin for than program?

Your assumption that stdin/stdout (while having the same logical name) are shared among all processes is wrong at best.
stdin/stdout are logical names for open files that are forwarded (or initialized) by the process that has started a given process. Actually, with the standard fork-and-exec pattern the setup of those may occur already in the new process (after fork) before exec is being called.
stdin/stdout usually are just inherited from parent. So, yes there exist groups of processes that share stdin and/or stdout for a given filenode.
Also, as a filedescriptor may be a side of a pipe, you need not have file from a filesystem (or a device node) linked to any of the well known standard channels (you also should include stderr into your considerations).
The normal way of setup is:
the parent (e.g. your shell) is calling fork
the forked process (child) is setting up environment, standard I/O channels and anything else.
the child then executes exec to overlay the process with the target image to be executed.
When setting up: it either will keep the existing channels or replace them with new ones e.g. creating a pipe and linking the endpoints appropriately. (To be honest, creating the pipe need to happen before the fork in that simplified description)
This way, most of the process have their own I/O channels.
Nevertheless, multiple processes may write into a channel they are connected to (have a valid filedescriptor to). When reading each junk of data (usually lines with terminals or blocks with files) is being read by a single reader only. So if you have several (running) processes reading from a terminal as stdin, only one will read your typing, while the other(s) will not see this typing at all.

Related

MPI one-sided file I/O

I have some questions on performing File I/Os using MPI.
A set of files are distributed across different processes.
I want the processes to read the files in the other processes.
For example, in one-sided communication, each process sets a window visible to other processors. I need the exactly same functionality. (Create 'windows' for all files and share them so that any process can read any file from any offset)
Is it possible in MPI? I read lots of documentations about MPI, but couldn't find the exact one.
The simple answer is that you can't do that automatically with MPI.
You can convince yourself by seeing that MPI_File_open() is a collective call taking an intra-communicator as first argument and returning a file handler to the opened file as last argument. In this communicator, all processes open the file and therefore, all processes must see the file. So unless a process sees a file, it cannot get a MPI_file handler to access it.
Now, that doesn't mean there's no solution. A possibility could be to do by hand exactly what you described, namely:
Each MPI process opens individually the file they see and are responsible of; then
Each of theses processes reads this local file into a buffer;
Theses individual buffers are all exposed, using either a global MPI_Win memory windows, or several individual ones, ready for one-sided read accesses; and finally
All read accesses to any data that were previously stored in these individual local files, are now done through MPI_Get() calls using the memory window(s).
The true limitation of this approach is that it requires to fully read all of the individual files, therefore, you need to have sufficient memory per node for storing each of them. I'm well aware that this is a very very big caveat that could just make the solution completely impractical. However, if the memory is sufficient, this is an easy approach.
Another even simpler solution would be to store the files into a shared file system, or having them all copied on all local file systems. I imagine this isn't an option since the question wouldn't have been asked otherwise...
Finally, in last resort, a possibility I see would be to dedicate a MPI process (or an OpenMP thread of a MPI process) per node to serve each files. This process would just act as a "file server", answering "read" request coming from the other MPI processes, and serving them by reading the requested data from the file, and sending it back via MPI. It's a bit lengthy to write, but it should work.

How to see the process table in unix?

What's the UNIX command to see the processes table, remember that table contains:
process status
pointers
process size
user ids
process ids
event descriptors
priority
etc
The "process table" as such lives in the kernel's memory. Some systems (such as AIX, Solaris and Linux--which is not "unix") have a /proc filesystem which makes those tables visible to ordinary programs. Without that, programs such as ps (on very old systems such as SunOS 4) required elevated privileges to read the /dev/kmem (kernel memory) special device, as well as having detailed knowledge about the kernel memory layout.
Your question is open ended, and an answer to a specific question you may have had can be looked up in any man page as #Alfasin suggests in his answer. A lot depends on what you are trying to do.
As #ThomasDickey points out in his response, in UNIX and most of its' derivatives, the command for viewing processes being run in the background or foreground is in fact the ps command.
ps stands for 'process status', answering your first bullet item. But the command uses over 30 options and depending on what information you seek, and permissions granted to you by the systems administrator, you can get various types of information from the command.
For example, for the second bullet item on your list above, depending on what you are looking for, you can get information on 3 different types of pointers - the session pointer (with option 'sess'), the terminal session pointer (tsess), and the process pointer (uprocp).
The rest of your items that you have listed are mostly available as standard output of the command.
Some UNIX variants implement a view of the system process table inside of the file system to support the running of programs such as ps. This is normally mounted on /proc (see #ThomasDickey response above)
Typical reasons for understanding the working of the command include system-administration responsibilities such as tracking the origin of the initiated processes, killing runaway or orphaned processes, examining the file size of the process and setting limits where necessary, etc. UNIX developers can also use it in conjunction with ipc features, etc. An understanding of the process table and status will help with associated UNIX features such as the kvm interface to examine crash dump, etc. or to get or set the kernal state.
Hope this helps

Confusion as to how fork() and exec() work

Consider the following:
Where I'm getting confused is in the step "child duplicate of parent". If you're running a process such as say, skype, if it forks, is it copying skype, then overwriting that process copy with some other program? Moreover, what if the child process has memory requirements far different from the parent process? Wouldn't assigning the same address space as the parent be a problem?
I feel like I'm thinking about this all wrong, perhaps because I'm imagining the processes to be entire programs in execution rather than some simple instruction like "copy data from X to Y".
All modern Unix implementations use virtual memory. That allows them to get away with not actually copying much when forking. Instead, their memory map contains pointers to the parent's memory until they start modifying it.
When a child process exec's a program, that program is copied into memory (if it wasn't already there) and the process's memory map is updated to point to the new program.
fork(2) is difficult to understand. It is explained a lot, read also fork (system call) wikipage and several chapters of Advanced Linux Programming. Notice that fork does not copy the running program (i.e. the /usr/bin/skype ELF executable file), but it is lazily copying (using copy-on-write techniques - by configuring the MMU) the address space (in virtual memory) of the forking process. Each process has its address space (but might share some segments with some other processes, see mmap(2) and execve(2) ....). Since each process has its own address space, changes in the address space of one process does not (usually) affect the parent process. However, processes may have shared memory but then need to synchronize: see shm_overview(7) & sem_overview(7)...
By definition of fork, just after the fork syscall the parent and child processes have nearly equal state (in particular the address space of the child is a copy of the address space of the parent). The only difference being the return value of fork.
And execve is overwriting the address space and registers of the current process.
Notice that on Linux all processes (with a few exceptions, like kernel started processes such as /sbin/modprobe etc) are obtained by fork-ing -from the initial /sbin/init process of pid 1.
At last, system calls -listed in syscalls(2)- like fork are an elementary operation from the application's point of view, since the real processing is done inside the Linux kernel. Play with strace(1). See also this answer and that one.
A process is often some machine state (registers) + its address space + some kernel state (e.g. file descriptors), etc... (but read about zombie processes).
Take time to follow all the links I gave you.

MPI, NFS File Writing

I'm having an issue with a MPI program running across a group of Linux nodes. The group is currently set up with NFS, with /home/mpi mounted across all nodes. The problem is that the program requires all of the nodes to open a file in the file system in write mode (use fopen on /home/mpi/file), and write to while it does calculations. One node will be able to open it, and the others won't and will throw an error. Instead I want each node to have its own file to write to.
I was wondering if there was a way to get around this. I was thinking about making a separate file for each node, with the nodes rank appended to the filename, but was wondering if there were simpler ways to get around this issue. Is there a way to set up the group so that all the worker nodes have their own copy of the /home/mpi directory that is auto-updated with any changes that the master node does to its copy?
Thanks.
As far as I know, the standard way of doing things is to open one file per node, indexed by rank as you described. Depending on what these files are used for (e.g. logging), you then have to write a script to re-combine them at the end of the computation.
If you really need all processes to write to the same file on the filesystem, you'll have to somehow coordinate concurrent outputs from all processes wanting to write to the file.
There is no way to do this at the filesystem level as far as I know, but you can do this within you MPI code. The standard, historical implementation of this is to have all MPI processes send messages to rank 0, which is in charge of effectively writing them to the filesystem.
Another option would be to look at the IO features introduced in MPI2, which allow all processes to work on different parts of the same file.

Why only related processes can only communicate using pipe() (IPC)?

why does there is limitation that with pipe() only parent and child process can communicate, why not unrelated processes?
why can't two children of a process can't communicate using pipe()?
There do have limitation.
Pipe use fd to read/write data, fd is per-process, a process maintain a fd table, child inherit the fd table when fork, and each inherited fd refer to the same open file as in parent process, which is maintained by kernel.
Processes that communicate via the same pipe should be related.
It means, the 2 processes should both aware of the 2 fd of the pipe.
<TLPI> says:
The pipe should be created by a common ancestor before the series of fork() calls that led to the existence of the processes.
There is no such limitation. Any two processes which have a means of obtaining references to each end of the pipe can communicate. A process can even communicate with itself using a pipe.
Any process could obtain a reference to one of the ends of a pipe using any of the following generic means of communicating file descriptors between processes. Pipes are not special in this respect.
The process itself called pipe() and obtained file descriptors for both ends.
The process received the file descriptor as SCM_RIGHTS ancillary data through a socket.
The process obtained the file descriptor from another arbitrary process using platform-specific means like /proc/<pid>/fd on Linux.
(There might be other methods.)
The process inherited the file descriptor from an ancestor (direct or indirect) that obtained it using one of the aforementioned methods.

Resources