text section in process memory map - unix

Normally, the process memory map consists of stack, text, data+bss and heap.
The memory address is independent to other processes except text section.
My question is about in text section, is there only child process could share
the same text section with its parent process? or other processes could share it too.
======================================================================
#avd: yes, refer to the wikipedia
http://en.wikipedia.org/wiki/Process_isolation
"Process isolation can be implemented by with virtual address space, where process A's address space is different from process B's address space - preventing A to write into B."
This is what I mean to each process has its own memory map.
However, when I read the OS book, it mentions that the text section could be shared. So I am not very clear with this or probably I misunderstood that part of the book.
======================================================================
Extra information:
http://www.hep.wisc.edu/~pinghc/Process_Memory.htm
Processes share the text segment if a second copy of the program is to be executed concurrently. In this setting, the system references the previously loaded text segment with the pointer rather than reloading a duplicated. If needed, shared text, which is the default when using the C/C++ compiler, can be turned off by using the -N option on the compile time.

Every process has it's very own virtual addresses. That virtual address is not shared with anybody including child process. But these virtual addresses are translated or, in other words, mapped to physical addresses by OS kernel and MMU.
The thing is that virtual addresses from different address spaces can point to the same physical addresses! For example, when process forked, it gets its own virtual address space, but unless this child process is not changing (writing) to it's memory it shares memory with parent process for reading. When child process will try to modify some memory kernel will create separate own copy of particular page for child process and it will not be shared anymore (until child process forked itself). This is known as Copy on Write (CoW).
So the real thing is that text section could be shared by mapping different virtual pages to the same physical pages (called frames).

Related

Layout of ELF binary in virtual memory

All modern *nix operating systems use virtual memory concept (with paging). And as far as i know, this concept of virtual memory is used to set a layer of abstraction between the programmer and the real physical memory: the programmer doesn't have to be limited to ram size and he can see the program as a large contiguous space of data, instructions, heap and stack (manipulate pointers according to that concept). When we compile & link a source code we get an executable file stored on HDD known as ELF, that file contains all data and instructions of the program beside some additional information like stack and heap sizes (only created at runtime).
Now my questions:
1. How does this binary file (elf) is mapped to virtual memory ?
2. Does every process has its own virtual memory (a page file !!!) ?
3. What is the program's layout after being mapped to virtual memory ?
4. What is exactly the preferred base address and how does it look in virtual memory ?
5. What is the difference between a RVA and an Offset ?
You don't have to answers all the questions or give detailed answers instead you can provide me with good full readings about the subject, thanks.
How does this binary file (elf) is mapped to virtual memory ??
The executable file contains instructions to the loader on how to lay out the address space. On some systems, parts of the executable can be mapped to memory and serve as a page file.
Does every process has its own virtual memory (a page file !!!) ?
Every process has its own logical address space. Some areas within that address space may be shared with other processes.
What is the program's layout after being mapped to virtual memory ?
The depends upon the system and what the executable told the loader to do.
What is exactly the preferred base address and how does it look in virtual memory ?
That is just the desirable start location for loading something in memory. Most compilers generate relocatable code that is not tied to any specific logical address.
What is the difference between a RVA and an Offset ?
RVA is a screwed up unixism for an offset. What is not clear, in your question is what type of offset you are talking about. There are byte offsets from pages. RVA is usually an offset from a loading location that can span pages.

Confusion as to how fork() and exec() work

Consider the following:
Where I'm getting confused is in the step "child duplicate of parent". If you're running a process such as say, skype, if it forks, is it copying skype, then overwriting that process copy with some other program? Moreover, what if the child process has memory requirements far different from the parent process? Wouldn't assigning the same address space as the parent be a problem?
I feel like I'm thinking about this all wrong, perhaps because I'm imagining the processes to be entire programs in execution rather than some simple instruction like "copy data from X to Y".
All modern Unix implementations use virtual memory. That allows them to get away with not actually copying much when forking. Instead, their memory map contains pointers to the parent's memory until they start modifying it.
When a child process exec's a program, that program is copied into memory (if it wasn't already there) and the process's memory map is updated to point to the new program.
fork(2) is difficult to understand. It is explained a lot, read also fork (system call) wikipage and several chapters of Advanced Linux Programming. Notice that fork does not copy the running program (i.e. the /usr/bin/skype ELF executable file), but it is lazily copying (using copy-on-write techniques - by configuring the MMU) the address space (in virtual memory) of the forking process. Each process has its address space (but might share some segments with some other processes, see mmap(2) and execve(2) ....). Since each process has its own address space, changes in the address space of one process does not (usually) affect the parent process. However, processes may have shared memory but then need to synchronize: see shm_overview(7) & sem_overview(7)...
By definition of fork, just after the fork syscall the parent and child processes have nearly equal state (in particular the address space of the child is a copy of the address space of the parent). The only difference being the return value of fork.
And execve is overwriting the address space and registers of the current process.
Notice that on Linux all processes (with a few exceptions, like kernel started processes such as /sbin/modprobe etc) are obtained by fork-ing -from the initial /sbin/init process of pid 1.
At last, system calls -listed in syscalls(2)- like fork are an elementary operation from the application's point of view, since the real processing is done inside the Linux kernel. Play with strace(1). See also this answer and that one.
A process is often some machine state (registers) + its address space + some kernel state (e.g. file descriptors), etc... (but read about zombie processes).
Take time to follow all the links I gave you.

virtual address

Suppose I'm starting two instances of the same program. Will the text region of both programs have same virtual addresses?
Depends. On most systems, if you run the same program twice in the same environment (same parameters, etc.), you'll find the same address mapping. This is simply because most of what the process does is deterministic, dependent only on the environment, command-line parameters, contents of files read, but not on changing data such as the date or process ID. This is very useful when debugging: if you restart your program, sometimes even after a small code change and recompilation, you have a chance that the memory layout remained the same. Of course, different instances of the program running concurrently may have the same virtual addresses, but they won't have the same physical addresses.
Some systems, such as OpenBSD, or Linux with various hardening settings, implement address space layout randomization (ASLR). ASLR means that each time a process starts, the virtual addresses of its code, data, stack(s) and heap(s) are determined at random. This is a security features, designed to make exploits of security vulnerabilities harder: the exploit code can't just access known code at known addresses. However, as ASLR becomes more popular, exploits also become more sophisticated to work around it. ASLR remains useful because it increases the workload for the exploit writer without adding a lot of complexity.
Probably not, but it's possible that they could. Each process has its own independent memory space.

where does the shared memory gets allocated?

In Linux, When we are sharing data between 2 or more processes using shared memory, where does the shared memory gets allocated?
Will it become part of process address space at run time? as the process cannot access the memory outside its address space.
Could some one please clarify?
When you have shared memory, then that memory gets mapped into the virtual address space of each process that shares the memory (not necessarily at the same virtual addresses in each process). The virtual memory manager ensures that the virtual addresses both map to the same physical addresses so that the sharing actually happens.
Assuming System V: One process takes memory which is allocated inside its process space and makes it available to others via IPC. The most common way to share it is to map the memory into the other process' virtual address space. In which case they can access the memory as though it was part of their won address space.

Determine who/what reserved 5.5 GB of virtual memory in w3wp.exe

On my machine (XP, 64) the ASP.net worker process (w3wp.exe) always launches with 5.5GB of Virtual Memory reserved. This happens regardless of the web application it's hosting (it can be anything, even an empty web page in aspx).
This big old chunk of virtual memory is reserved at the moment the process starts, so this isn't a gradual memory "leak" of some sort.
Some snooping around with windbg shows that the memory is question is Private, Reserved and RegionUsageIsVAD, which indicates it might be the work of someone calling VirtualAlloc. It also shows that the memory in question is allocated/reserved in 4 big chunks of 1GB each and a several smaller ones (1/4GB each).
So I guess I need to figure out who's calling VirtualAlloc and reserving all this memory. How do I do that?
Attaching a debugger to the process prior to the memory allocation is tricky, because w3wp.exe is a process launched by svchost.exe (that is, IIS/ASP.Net filter) and if I try to launch it myself in order to debug it it just closes down without all this profuse memory reservation. Also, the command line parameters are invalid if I resuse them (which makes sense because it's a pipe created by the calling process).
I can attach windbg it to the process after the fact (which is how I found the memory regions in question), but I'm not sure it's possible at that point to determine who allocated what.
David Wang answers this to a similar question:
[...] the ASP.Net performance developer tells me that:
The Reserved virtual memory is nothing to worry about. You can view
it as performance/caching prerequisite
of the CLR. And heavy load testing
shows that it is nothing to worry
about.
System.Windows.Forms - It's not pulled in by empty hello world ASPX
page. You can use Microsoft Debugging
Tools and "sx e ld
system.windows.forms" to identify what
is actually pulling it in at runtime.
Or you can ildasm to find the
dependency.
mscorlib - make sure it is GAC'd and NGen'd properly.
Virtual memory is just the address space allocated to the process. It has nothing to do with memory usage.
See:
Virtual Memory
Pushing the Limits of Windows: Virtual Memory
http://support.microsoft.com/kb/555223
Reserved memory is very different from allocated memory. Reserving memory just allocates address space. It doesn't commit any physical pages.
This address space is likely allocated by IIS for its heap. It will only commit pages when needed.
If you really want to launch w3wp.exe from windbg, you probably need to launch it with valid command-line arguments. You can use Process Explorer to determine what the command line for the current w3wp.exe process is. For instance, on my server, mine was:
c:\windows\system32\inetsrv\w3wp.exe -a \.\pipe\iisipmeca56ca2-3a28-452a-9ad3-9e3da7b7c765 -t 20 -ap "DefaultAppPool"
I'm not sure what the UID in there specifies, but it looks it's probably generated on the fly by the W3SVC service (which is what launched w3wp.exe) to name the pipe specified there. So you should definitely look at your command line before launching w3wp from windbg.

Resources