In Linux, When we are sharing data between 2 or more processes using shared memory, where does the shared memory gets allocated?
Will it become part of process address space at run time? as the process cannot access the memory outside its address space.
Could some one please clarify?
When you have shared memory, then that memory gets mapped into the virtual address space of each process that shares the memory (not necessarily at the same virtual addresses in each process). The virtual memory manager ensures that the virtual addresses both map to the same physical addresses so that the sharing actually happens.
Assuming System V: One process takes memory which is allocated inside its process space and makes it available to others via IPC. The most common way to share it is to map the memory into the other process' virtual address space. In which case they can access the memory as though it was part of their won address space.
Related
Is there a way you can identify whether an object is stored on the stack or heap solely from its memory address? I ask because it would be useful to know this when debugging and a memory address comes up in an error.
For instance:
If have a memory address: 0x7fd8507c6
Can I determine anything about the object based on this address?
You don't mention which OS you are using. I'll answer for Microsoft Windows as that's the one I've been using for the last 25 years. Most of what I knew about Unix/Linux I've forgotten.
If you just have the address and no other information - for 32 bit Windows you can tell if it's user space (lower 2GB) or kernel space (upper 2GB), but that's about it (assuming you don't have the /3GB boot option).
If you have the address and you can run some code you can use VirtualQuery() to get information about the address. If you get a non-zero return value you can use the data in the returned MEMORY_BUFFER_INFORMATION data.
The State, Type, and Protect values will tell you about the possible uses for the memory - whether it's memory mapped, a DLL (Type & MEM_IMAGE != 0), etc. You can't infer from this information if the memory is a thread's stack or if it's in a heap. You can however determine if the address is in memory that isn't heap or stack (memory in a DLL is not in a stack or heap, non-accessible memory isn't in a stack or a heap).
To determine where a thread stack is you could examine all pages in the application looking for a guard page at the end of a thread's stack. You can then infer the location of the stack space using the default stack size stored in the PE header (or if you can't do that, just use the default size of 1MB - few people change it) and the address you have (is it in the space you have inferred?).
To determine if the address is in a memory heap you'd need to enumerate the application heaps (GetProcessHeaps()) and then enumerate each heap (HeapWalk()) found checking the areas being managed by the heap. Not all Windows heaps can be enumerated.
To get any further than this you need to have tracked allocations/deallocations etc to each heap and have all that information stored for use.
You could also track when threads are created/destroyed/exit and calculate the thread addresses that way.
That's a broad brush answer informed by my experience creating a memory leak detection tool (which needs this information) and numerous other tools.
I am fairly new to OpenCL and though I have understood everything up until now, but I am having trouble understanding how buffer objects work.
I haven't understood where a buffer object is stored. In this StackOverflow question it is stated that:
If you have one device only, probably (99.99%) is going to be in the device. (In rare cases it may be in the host if the device does not have enough memory for the time being)
To me, this means that buffer objects are stored in device memory. However, as is stated in this StackOverflow question, if the flag CL_MEM_ALLOC_HOST_PTR is used in clCreateBuffer, the memory used will most likely be pinned memory. My understanding is that, when memory is pinned it will not be swapped out. This means that pinned memory MUST be located in RAM, not in device memory.
So what is actually happening?
What I would like to know what do the flags:
CL_MEM_USE_HOST_PTR
CL_MEM_COPY_HOST_PTR
CL_MEM_ALLOC_HOST_PTR
imply about the location of buffer.
Thank you
Let's first have a look at the signature of clCreateBuffer:
cl_mem clCreateBuffer(
cl_context context,
cl_mem_flags flags,
size_t size,
void *host_ptr,
cl_int *errcode_ret)
There is no argument here that would provide the OpenCL runtime with an exact device to whose memory the buffer shall be put, as a context can have multiple devices. The runtime only knows as soon as we use a buffer object, e.g. read/write from/to it, as those operations need a command queue that is connected to a specific device.
Every memory object an reside in either the host memory or one of the context's device's memories, and the runtime might migrate it as needed. So in general, every memory object, might have a piece of internal host memory within the OpenCL runtime. What the runtime actually does is implementation dependent, so we cannot not make too many assumptions and get no portable guarantees. That means everything about pinning etc. is implementation-dependent, and you can only hope for the best, but avoid patterns that will definitely prevent the use of pinned memory.
Why do we want pinned memory?
Pinned memory means, that the virtual address of our memory page in our process' address space has a fixed translation into a physical memory address of the RAM. This enables DMA (Direct Memory Access) transfers (which operate on physical addresses) between the device memory of a GPU and the CPU memory using PCIe. DMA lowers the CPU load and possibly increases copy speed. So we want the internal host storage of our OpenCL memory objects to be pinned, to increase the performance of data transfers between the internal host storage and the device memory of an OpenCL memory object.
As a basic rule of thumb: if your runtime allocates the host memory, it might be pinned. If you allocate it in your application code, the runtime will pessimistically assume it is not pinned - which usually is a correct assumption.
CL_MEM_USE_HOST_PTR
Allows us to provide memory to the OpenCL implementation for internal host-storage of the object. It does not mean that the memory object will not be migrated into device memory if we call a kernel. As that memory is user-provided, the runtime cannot assume it to be pinned. This might lead to an additional copy between the un-pinned internal host storage and a pinned buffer prior to device transfer, to enable DMA for host-device-transfers.
CL_MEM_ALLOC_HOST_PTR
We tell the runtime to allocate host memory for the object. It could be pinned.
CL_MEM_COPY_HOST_PTR
We provide host memory to copy-initialise our buffer from, not to use it internally. We can also combine it with CL_MEM_ALLOC_HOST_PTR. The runtime will allocate memory for internal host storage. It could be pinned.
Hope that helps.
The specification is (deliberately?) vague on the topic, leaving a lot of freedom to implementors. So unless an OpenCL implementation you are targeting makes explicit guarantees for the flags, you should treat them as advisory.
First off, CL_MEM_COPY_HOST_PTR actually has nothing to do with allocation, it just means that you would like clCreateBuffer to pre-fill the allocated memory with the contents of the memory at the host_ptr you passed to the call. This is as if you called clCreateBuffer with host_ptr = NULL and without this flag, and then made a blocking clEnqueueWriteBuffer call to write the entire buffer.
Regarding allocation modes:
CL_MEM_USE_HOST_PTR - this means you've pre-allocated some memory, correctly aligned, and would like to use this as backing memory for the buffer. The implementation can still allocate device memory and copy back and forth between your buffer and the allocated memory, if the device does not support directly accessing host memory, or if the driver decides that a shadow copy to VRAM will be more efficient than directly accessing system memory. On implementations that can read directly from system memory though, this is one option for zero-copy buffers.
CL_MEM_ALLOC_HOST_PTR - This is a hint to tell the OpenCL implementation that you're planning to access the buffer from the host side by mapping it into host address space, but unlike CL_MEM_USE_HOST_PTR, you are leaving the allocation itself to the OpenCL implementation. For implementations that support it, this is another option for zero copy buffers: create the buffer, map it to the host, get a host algorithm or I/O to write to the mapped memory, then unmap it and use it in a GPU kernel. Unlike CL_MEM_USE_HOST_PTR, this leaves the door open for using VRAM that can be mapped directly to the CPU's address space (e.g. PCIe BARs).
Default (neither of the above 2): Allocate wherever most convenient for the device. Typically VRAM, and if memory-mapping into host memory is not supported by the device, this typically means that if you map it into host address space, you end up with 2 copies of the buffer, one in VRAM and one in system memory, while the OpenCL implementation internally copies back and forth between the 2.
Note that the implementation may also use any access flags provided ( CL_MEM_HOST_WRITE_ONLY, CL_MEM_HOST_READ_ONLY, CL_MEM_HOST_NO_ACCESS, CL_MEM_WRITE_ONLY, CL_MEM_READ_ONLY, and CL_MEM_READ_WRITE) to influence the decision where to allocate memory.
Finally, regarding "pinned" memory: many modern systems have an IOMMU, and when this is active, system memory access from devices can cause IOMMU page faults, so the host memory technically doesn't even need to be resident. In any case, the OpenCL implementation is typically deeply integrated with a kernel-level device driver, which can typically pin system memory ranges (exclude them from paging) on demand. So if using CL_MEM_USE_HOST_PTR you just need to make sure you provide appropriately aligned memory, and the implementation will take care of pinning for you.
It is known that the 0 address (which is marked as the macro 'NULL'), is not legal to access.
I was wondering how is it that the operating system (say linux) can determine when there is an access to null address, somewhere in the code, without having to access each and every pointer address in the code?
I assume it has something to do with signal and specifically, the "sigsegv" signal.
But I'm not sure how it's done.
First of all a null pointer access is not necessarily invalid. Typically, either the operating system's program loader or the linker (depending upon the system) set up processes so that the the lowest page in the virtual address space is not mapped.
Many systems that do this also allow the application to map the first page, making a null reference valid.
The NULL pointer is checked the same way all other memory addresses are checked: through the logical address translation of the CPU.
Each time the processor accesses memory (ignoring caching) it looks up the address in the process's page table. If there is no corresponding entry, the processor triggers an access fault (that in Unix variants gets translated into a signal).
If there is an entry in the page table for the address, the processor checks the access allowed for the page. If you are in user mode and try to access a kernel protected page, that triggers a fault. If you are trying to write to a read only page, that triggers a fault. If you try to execute a non-executable page, that triggers a fault.
This is a rather lengthy topic. You need to understand logical memory translation (sometimes misnamed virtual memory) if you want to learn more on the topic.
Pointers refer to virtual address space. In the virtual address space, each page of memory can be mapped to real physical memory. The operating system takes care of this mapping separately for each process.
When you access memory through a pointer, the CPU looks at the mapping for the virtual address your pointer specifies and checks if there is real, physical memory behind. Additional checks are done to verify that you have read or write access to that piece of memory, depending on the operation you are attempting.
If there is no memory mapped for that address, the CPU generates a hardware interrupt. The OS catches that interrupt and - usually - signals sigsegv for the calling process.
The zero page containing the NULL address is usually intentionally left unmapped, so that NULL pointer accesses, which usually result from programming errors, are easily trapped.
Linux obtains this support from hardware. Processor is informed about the purpose of individual memory regions and their availability. If "unavailable" memory region is accessed the processor informs the operating system about the problem and the operating system informs the application.
It means two things:
There is no software overhead related to checking all pointers against the NULL value.
There is no precise check for allowed pointer values.
In other words, if your pointer points anywhere to the "available" memory then the hardware unit is unable to recognize the problem.
The Memory Management Unit plays a key role in the exception triggering when a NULL pointer is dereferenced or an invalid address is accessed.
During the normal virtual-to-physical memory mapping process done by MMU on each RAM access, the undefined address is simply not found in the range of virtual addresses defined in the MMU descriptors. This can have catastrophic consequences if occurred in OS kernel-space, or just process kill and cleanup in the user-space domain.
...how is it that the operating system (say linux) can determine when there is an access to null address, somewhere in the code, without having to access each and every pointer address in the code?
Well, OS cannot determine a NULL dereference without accessing the pointer. From the wiki for segmentation fault:
In computing, a segmentation fault (often shortened to segfault) or access violation is a fault raised by hardware with memory protection, notifying an operating system (OS) about a memory access violation; on x86 computers this is a form of general protection fault. The OS kernel will in response usually perform some corrective action, generally passing the fault on to the offending process by sending the process a signal....
The memory access violation is a run-time incident, and unless there is an invalid access, there is no way OS will raise the signal to the process.
FWIW, a process is allowed to access the memory allocated for it (in virtual address space). Any address, outside the allocated virtual address space, if accessed, will generate a fault (through MMU) which in turn, generates the segmentation fault.
TL;DR - SIGSEV is generated on encountering the NULL-pointer dereference, not before that. Also, OS does not detect the erroneous access itself, rather it is informed to the OS by the Memory Management Unit via raising a fault.
Normally, the process memory map consists of stack, text, data+bss and heap.
The memory address is independent to other processes except text section.
My question is about in text section, is there only child process could share
the same text section with its parent process? or other processes could share it too.
======================================================================
#avd: yes, refer to the wikipedia
http://en.wikipedia.org/wiki/Process_isolation
"Process isolation can be implemented by with virtual address space, where process A's address space is different from process B's address space - preventing A to write into B."
This is what I mean to each process has its own memory map.
However, when I read the OS book, it mentions that the text section could be shared. So I am not very clear with this or probably I misunderstood that part of the book.
======================================================================
Extra information:
http://www.hep.wisc.edu/~pinghc/Process_Memory.htm
Processes share the text segment if a second copy of the program is to be executed concurrently. In this setting, the system references the previously loaded text segment with the pointer rather than reloading a duplicated. If needed, shared text, which is the default when using the C/C++ compiler, can be turned off by using the -N option on the compile time.
Every process has it's very own virtual addresses. That virtual address is not shared with anybody including child process. But these virtual addresses are translated or, in other words, mapped to physical addresses by OS kernel and MMU.
The thing is that virtual addresses from different address spaces can point to the same physical addresses! For example, when process forked, it gets its own virtual address space, but unless this child process is not changing (writing) to it's memory it shares memory with parent process for reading. When child process will try to modify some memory kernel will create separate own copy of particular page for child process and it will not be shared anymore (until child process forked itself). This is known as Copy on Write (CoW).
So the real thing is that text section could be shared by mapping different virtual pages to the same physical pages (called frames).
On my machine (XP, 64) the ASP.net worker process (w3wp.exe) always launches with 5.5GB of Virtual Memory reserved. This happens regardless of the web application it's hosting (it can be anything, even an empty web page in aspx).
This big old chunk of virtual memory is reserved at the moment the process starts, so this isn't a gradual memory "leak" of some sort.
Some snooping around with windbg shows that the memory is question is Private, Reserved and RegionUsageIsVAD, which indicates it might be the work of someone calling VirtualAlloc. It also shows that the memory in question is allocated/reserved in 4 big chunks of 1GB each and a several smaller ones (1/4GB each).
So I guess I need to figure out who's calling VirtualAlloc and reserving all this memory. How do I do that?
Attaching a debugger to the process prior to the memory allocation is tricky, because w3wp.exe is a process launched by svchost.exe (that is, IIS/ASP.Net filter) and if I try to launch it myself in order to debug it it just closes down without all this profuse memory reservation. Also, the command line parameters are invalid if I resuse them (which makes sense because it's a pipe created by the calling process).
I can attach windbg it to the process after the fact (which is how I found the memory regions in question), but I'm not sure it's possible at that point to determine who allocated what.
David Wang answers this to a similar question:
[...] the ASP.Net performance developer tells me that:
The Reserved virtual memory is nothing to worry about. You can view
it as performance/caching prerequisite
of the CLR. And heavy load testing
shows that it is nothing to worry
about.
System.Windows.Forms - It's not pulled in by empty hello world ASPX
page. You can use Microsoft Debugging
Tools and "sx e ld
system.windows.forms" to identify what
is actually pulling it in at runtime.
Or you can ildasm to find the
dependency.
mscorlib - make sure it is GAC'd and NGen'd properly.
Virtual memory is just the address space allocated to the process. It has nothing to do with memory usage.
See:
Virtual Memory
Pushing the Limits of Windows: Virtual Memory
http://support.microsoft.com/kb/555223
Reserved memory is very different from allocated memory. Reserving memory just allocates address space. It doesn't commit any physical pages.
This address space is likely allocated by IIS for its heap. It will only commit pages when needed.
If you really want to launch w3wp.exe from windbg, you probably need to launch it with valid command-line arguments. You can use Process Explorer to determine what the command line for the current w3wp.exe process is. For instance, on my server, mine was:
c:\windows\system32\inetsrv\w3wp.exe -a \.\pipe\iisipmeca56ca2-3a28-452a-9ad3-9e3da7b7c765 -t 20 -ap "DefaultAppPool"
I'm not sure what the UID in there specifies, but it looks it's probably generated on the fly by the W3SVC service (which is what launched w3wp.exe) to name the pipe specified there. So you should definitely look at your command line before launching w3wp from windbg.