Can a memory address tell you anything about how/where the object is stored - pointers

Is there a way you can identify whether an object is stored on the stack or heap solely from its memory address? I ask because it would be useful to know this when debugging and a memory address comes up in an error.
For instance:
If have a memory address: 0x7fd8507c6
Can I determine anything about the object based on this address?

You don't mention which OS you are using. I'll answer for Microsoft Windows as that's the one I've been using for the last 25 years. Most of what I knew about Unix/Linux I've forgotten.
If you just have the address and no other information - for 32 bit Windows you can tell if it's user space (lower 2GB) or kernel space (upper 2GB), but that's about it (assuming you don't have the /3GB boot option).
If you have the address and you can run some code you can use VirtualQuery() to get information about the address. If you get a non-zero return value you can use the data in the returned MEMORY_BUFFER_INFORMATION data.
The State, Type, and Protect values will tell you about the possible uses for the memory - whether it's memory mapped, a DLL (Type & MEM_IMAGE != 0), etc. You can't infer from this information if the memory is a thread's stack or if it's in a heap. You can however determine if the address is in memory that isn't heap or stack (memory in a DLL is not in a stack or heap, non-accessible memory isn't in a stack or a heap).
To determine where a thread stack is you could examine all pages in the application looking for a guard page at the end of a thread's stack. You can then infer the location of the stack space using the default stack size stored in the PE header (or if you can't do that, just use the default size of 1MB - few people change it) and the address you have (is it in the space you have inferred?).
To determine if the address is in a memory heap you'd need to enumerate the application heaps (GetProcessHeaps()) and then enumerate each heap (HeapWalk()) found checking the areas being managed by the heap. Not all Windows heaps can be enumerated.
To get any further than this you need to have tracked allocations/deallocations etc to each heap and have all that information stored for use.
You could also track when threads are created/destroyed/exit and calculate the thread addresses that way.
That's a broad brush answer informed by my experience creating a memory leak detection tool (which needs this information) and numerous other tools.

Related

OpenCL Buffer Creation

I am fairly new to OpenCL and though I have understood everything up until now, but I am having trouble understanding how buffer objects work.
I haven't understood where a buffer object is stored. In this StackOverflow question it is stated that:
If you have one device only, probably (99.99%) is going to be in the device. (In rare cases it may be in the host if the device does not have enough memory for the time being)
To me, this means that buffer objects are stored in device memory. However, as is stated in this StackOverflow question, if the flag CL_MEM_ALLOC_HOST_PTR is used in clCreateBuffer, the memory used will most likely be pinned memory. My understanding is that, when memory is pinned it will not be swapped out. This means that pinned memory MUST be located in RAM, not in device memory.
So what is actually happening?
What I would like to know what do the flags:
CL_MEM_USE_HOST_PTR
CL_MEM_COPY_HOST_PTR
CL_MEM_ALLOC_HOST_PTR
imply about the location of buffer.
Thank you
Let's first have a look at the signature of clCreateBuffer:
cl_mem clCreateBuffer(
cl_context context,
cl_mem_flags flags,
size_t size,
void *host_ptr,
cl_int *errcode_ret)
There is no argument here that would provide the OpenCL runtime with an exact device to whose memory the buffer shall be put, as a context can have multiple devices. The runtime only knows as soon as we use a buffer object, e.g. read/write from/to it, as those operations need a command queue that is connected to a specific device.
Every memory object an reside in either the host memory or one of the context's device's memories, and the runtime might migrate it as needed. So in general, every memory object, might have a piece of internal host memory within the OpenCL runtime. What the runtime actually does is implementation dependent, so we cannot not make too many assumptions and get no portable guarantees. That means everything about pinning etc. is implementation-dependent, and you can only hope for the best, but avoid patterns that will definitely prevent the use of pinned memory.
Why do we want pinned memory?
Pinned memory means, that the virtual address of our memory page in our process' address space has a fixed translation into a physical memory address of the RAM. This enables DMA (Direct Memory Access) transfers (which operate on physical addresses) between the device memory of a GPU and the CPU memory using PCIe. DMA lowers the CPU load and possibly increases copy speed. So we want the internal host storage of our OpenCL memory objects to be pinned, to increase the performance of data transfers between the internal host storage and the device memory of an OpenCL memory object.
As a basic rule of thumb: if your runtime allocates the host memory, it might be pinned. If you allocate it in your application code, the runtime will pessimistically assume it is not pinned - which usually is a correct assumption.
CL_MEM_USE_HOST_PTR
Allows us to provide memory to the OpenCL implementation for internal host-storage of the object. It does not mean that the memory object will not be migrated into device memory if we call a kernel. As that memory is user-provided, the runtime cannot assume it to be pinned. This might lead to an additional copy between the un-pinned internal host storage and a pinned buffer prior to device transfer, to enable DMA for host-device-transfers.
CL_MEM_ALLOC_HOST_PTR
We tell the runtime to allocate host memory for the object. It could be pinned.
CL_MEM_COPY_HOST_PTR
We provide host memory to copy-initialise our buffer from, not to use it internally. We can also combine it with CL_MEM_ALLOC_HOST_PTR. The runtime will allocate memory for internal host storage. It could be pinned.
Hope that helps.
The specification is (deliberately?) vague on the topic, leaving a lot of freedom to implementors. So unless an OpenCL implementation you are targeting makes explicit guarantees for the flags, you should treat them as advisory.
First off, CL_MEM_COPY_HOST_PTR actually has nothing to do with allocation, it just means that you would like clCreateBuffer to pre-fill the allocated memory with the contents of the memory at the host_ptr you passed to the call. This is as if you called clCreateBuffer with host_ptr = NULL and without this flag, and then made a blocking clEnqueueWriteBuffer call to write the entire buffer.
Regarding allocation modes:
CL_MEM_USE_HOST_PTR - this means you've pre-allocated some memory, correctly aligned, and would like to use this as backing memory for the buffer. The implementation can still allocate device memory and copy back and forth between your buffer and the allocated memory, if the device does not support directly accessing host memory, or if the driver decides that a shadow copy to VRAM will be more efficient than directly accessing system memory. On implementations that can read directly from system memory though, this is one option for zero-copy buffers.
CL_MEM_ALLOC_HOST_PTR - This is a hint to tell the OpenCL implementation that you're planning to access the buffer from the host side by mapping it into host address space, but unlike CL_MEM_USE_HOST_PTR, you are leaving the allocation itself to the OpenCL implementation. For implementations that support it, this is another option for zero copy buffers: create the buffer, map it to the host, get a host algorithm or I/O to write to the mapped memory, then unmap it and use it in a GPU kernel. Unlike CL_MEM_USE_HOST_PTR, this leaves the door open for using VRAM that can be mapped directly to the CPU's address space (e.g. PCIe BARs).
Default (neither of the above 2): Allocate wherever most convenient for the device. Typically VRAM, and if memory-mapping into host memory is not supported by the device, this typically means that if you map it into host address space, you end up with 2 copies of the buffer, one in VRAM and one in system memory, while the OpenCL implementation internally copies back and forth between the 2.
Note that the implementation may also use any access flags provided ( CL_MEM_HOST_WRITE_ONLY, CL_MEM_HOST_READ_ONLY, CL_MEM_HOST_NO_ACCESS, CL_MEM_WRITE_ONLY, CL_MEM_READ_ONLY, and CL_MEM_READ_WRITE) to influence the decision where to allocate memory.
Finally, regarding "pinned" memory: many modern systems have an IOMMU, and when this is active, system memory access from devices can cause IOMMU page faults, so the host memory technically doesn't even need to be resident. In any case, the OpenCL implementation is typically deeply integrated with a kernel-level device driver, which can typically pin system memory ranges (exclude them from paging) on demand. So if using CL_MEM_USE_HOST_PTR you just need to make sure you provide appropriately aligned memory, and the implementation will take care of pinning for you.

Layout of ELF binary in virtual memory

All modern *nix operating systems use virtual memory concept (with paging). And as far as i know, this concept of virtual memory is used to set a layer of abstraction between the programmer and the real physical memory: the programmer doesn't have to be limited to ram size and he can see the program as a large contiguous space of data, instructions, heap and stack (manipulate pointers according to that concept). When we compile & link a source code we get an executable file stored on HDD known as ELF, that file contains all data and instructions of the program beside some additional information like stack and heap sizes (only created at runtime).
Now my questions:
1. How does this binary file (elf) is mapped to virtual memory ?
2. Does every process has its own virtual memory (a page file !!!) ?
3. What is the program's layout after being mapped to virtual memory ?
4. What is exactly the preferred base address and how does it look in virtual memory ?
5. What is the difference between a RVA and an Offset ?
You don't have to answers all the questions or give detailed answers instead you can provide me with good full readings about the subject, thanks.
How does this binary file (elf) is mapped to virtual memory ??
The executable file contains instructions to the loader on how to lay out the address space. On some systems, parts of the executable can be mapped to memory and serve as a page file.
Does every process has its own virtual memory (a page file !!!) ?
Every process has its own logical address space. Some areas within that address space may be shared with other processes.
What is the program's layout after being mapped to virtual memory ?
The depends upon the system and what the executable told the loader to do.
What is exactly the preferred base address and how does it look in virtual memory ?
That is just the desirable start location for loading something in memory. Most compilers generate relocatable code that is not tied to any specific logical address.
What is the difference between a RVA and an Offset ?
RVA is a screwed up unixism for an offset. What is not clear, in your question is what type of offset you are talking about. There are byte offsets from pages. RVA is usually an offset from a loading location that can span pages.

how does the OS determine null pointer access without checking all pointer addresses?

It is known that the 0 address (which is marked as the macro 'NULL'), is not legal to access.
I was wondering how is it that the operating system (say linux) can determine when there is an access to null address, somewhere in the code, without having to access each and every pointer address in the code?
I assume it has something to do with signal and specifically, the "sigsegv" signal.
But I'm not sure how it's done.
First of all a null pointer access is not necessarily invalid. Typically, either the operating system's program loader or the linker (depending upon the system) set up processes so that the the lowest page in the virtual address space is not mapped.
Many systems that do this also allow the application to map the first page, making a null reference valid.
The NULL pointer is checked the same way all other memory addresses are checked: through the logical address translation of the CPU.
Each time the processor accesses memory (ignoring caching) it looks up the address in the process's page table. If there is no corresponding entry, the processor triggers an access fault (that in Unix variants gets translated into a signal).
If there is an entry in the page table for the address, the processor checks the access allowed for the page. If you are in user mode and try to access a kernel protected page, that triggers a fault. If you are trying to write to a read only page, that triggers a fault. If you try to execute a non-executable page, that triggers a fault.
This is a rather lengthy topic. You need to understand logical memory translation (sometimes misnamed virtual memory) if you want to learn more on the topic.
Pointers refer to virtual address space. In the virtual address space, each page of memory can be mapped to real physical memory. The operating system takes care of this mapping separately for each process.
When you access memory through a pointer, the CPU looks at the mapping for the virtual address your pointer specifies and checks if there is real, physical memory behind. Additional checks are done to verify that you have read or write access to that piece of memory, depending on the operation you are attempting.
If there is no memory mapped for that address, the CPU generates a hardware interrupt. The OS catches that interrupt and - usually - signals sigsegv for the calling process.
The zero page containing the NULL address is usually intentionally left unmapped, so that NULL pointer accesses, which usually result from programming errors, are easily trapped.
Linux obtains this support from hardware. Processor is informed about the purpose of individual memory regions and their availability. If "unavailable" memory region is accessed the processor informs the operating system about the problem and the operating system informs the application.
It means two things:
There is no software overhead related to checking all pointers against the NULL value.
There is no precise check for allowed pointer values.
In other words, if your pointer points anywhere to the "available" memory then the hardware unit is unable to recognize the problem.
The Memory Management Unit plays a key role in the exception triggering when a NULL pointer is dereferenced or an invalid address is accessed.
During the normal virtual-to-physical memory mapping process done by MMU on each RAM access, the undefined address is simply not found in the range of virtual addresses defined in the MMU descriptors. This can have catastrophic consequences if occurred in OS kernel-space, or just process kill and cleanup in the user-space domain.
...how is it that the operating system (say linux) can determine when there is an access to null address, somewhere in the code, without having to access each and every pointer address in the code?
Well, OS cannot determine a NULL dereference without accessing the pointer. From the wiki for segmentation fault:
In computing, a segmentation fault (often shortened to segfault) or access violation is a fault raised by hardware with memory protection, notifying an operating system (OS) about a memory access violation; on x86 computers this is a form of general protection fault. The OS kernel will in response usually perform some corrective action, generally passing the fault on to the offending process by sending the process a signal....
The memory access violation is a run-time incident, and unless there is an invalid access, there is no way OS will raise the signal to the process.
FWIW, a process is allowed to access the memory allocated for it (in virtual address space). Any address, outside the allocated virtual address space, if accessed, will generate a fault (through MMU) which in turn, generates the segmentation fault.
TL;DR - SIGSEV is generated on encountering the NULL-pointer dereference, not before that. Also, OS does not detect the erroneous access itself, rather it is informed to the OS by the Memory Management Unit via raising a fault.

Getting current transferred MPI network communication volume

I have a question related to MPI.
In order to keep track of the communication volume used by my implementation, I would like to get the currently-transferred data amount since the mpi-process' start until the current measure-point.
I checked the specification as well as the mpi.h header file of mpich and did not find a matching function to call or variable that keeps track of the network transfer costs. It would, of course, be possible to implement a small traffic registry or define a macro for tracking communication sizes, but maybe it can be read out from somewhere.
Do you know a method to gain the current transfer size, maybe it is also possible to get this number using a system call to get the network traffic size of the process?
Is it maybe possible to access the proc information of the current process, maybe the /proc/net is maintained per process as well, such as /proc/self/net?
Thank you in advance,
Martin

Do I need to initialize stack in GAS?

Hallo! Currently I'm learning basics of assembly. Earlier I was using TASM and Intel-syntax. There I had to initialize stack in some ways.
But now I'm using GNU Assembler and AT&T syntax. I looked through lots of examples and saw no any declaration/initialization of stack. I wonder if I have to do it? Or, may be, it's made without my help here? If is so, how exactly is it initialized automatically? Are there risks to rub important info in data-segment? I didn't also notice any directives concerning stack.
Thanks for your answers beforehand!
Oh, one more thing: are there any good books concerning programming in ASM (GAS) for Unix-like systems?
An OS with Virtual Memory handles the stack somewhat differently than how an OS without Virtual Memory handles it.
No VM (e.g. DOS, µClinux !MMU): you reserve some physical space for the stack. In DOS it depends on the memory model you use, for larger memory models you will allocate some memory and point SS (the stack segment) to it. In µClinux you will save the stack size in a field of the executable file format's header, see the bFLT format for an example.
VM → the stack grows dynamically, up to a configurable limit (see ulimit -s on Linux). Since each process has its own virtual address space, there is a lot of space between the stack and any other mapped virtual memory area.

Resources