I am exploring the lower level workings of the system, and was wondering how malloc determines the start address of the heap. Is the heap at a constant offset or is there a call of some sort to get the start address? Does the stack affect the start address of the heap?
sbrk returns the start address of the bytes it adds (or removes). In a fresh process with no heap allocated yet, the first call to sbrk should then return the start address of the "break" section of the heap. If I had to bet, that's what malloc implementations which use brk/sbrk probably do on their first run.
Traditionally, the heap started just above the text section and grew up; stack frames didn't affect start address at all as they grow down towards the unmapped 0 page. However, it's more common these days for
The first address to be randomized, to make it harder for exploits to hit the right address in memory
The heap to be non-contiguous, as malloc() usually just calls mmap() to get an address anywhere in the virtual address space
Related
-module(demo).
-export([factorial/1]).
factorial(0) -> 1;
factorial(N) ->
N * factorial(N-1).
The factorial is not tail recursive but why is it not overflowing the stack? I am able to get factorial of 100,000 without stack overflow but takes some time to compute.
An Erlang process's "stack" is not stored in the stack given by the system to the process (which is usually a few megabytes) but in the heap. As far as I know, it will grow unbounded until the system refuses to give the VM more memory.
The size includes 233 words for the heap area (which includes the stack). The garbage collector increases the heap as needed.
The main (outer) loop for a process must be tail-recursive. Otherwise, the stack grows until the process terminates.
Source
If you monitor the Erlang VM process in an process monitor like Activity Monitor on OSX or top on other UNIX-like systems, you'll see that the memory usage will keep on increasing until the calculation is complete, at which point a part of the memory (the one where the "stack" is stored) will be released (this happens gradually over a few seconds after the function returns for me).
I am new to UNIX, and I am studying some of UNIX system calls such as brk(), sbrk(), and so on....
Last day I have read about malloc() function, and I was confused a little bit!
Can anybody tell me why malloc reduces the number of sbrk() system calls that the program must perform?
And another question, do brk(0), sbrk(0) and malloc(0) return the same value?
Syscalls are expensive to process because of the additional overhead that a syscall places: you have to switch to kernel mode. A system call gets into the kernel by issuing a "trap" or interrupt. It's a call to the kernel for a service, and because it executes in the kernel address space, it has a high overhead switch to kernel (and then switching back).
This is why malloc reduces the number of calls to sbrk() and brk(). It does so by requesting more memory than you asked it to, so that it doesn't have to issue a syscall everytime you need more memory.
brk() and sbrk() are different.
brk is used to set the end of the data segment to the value you specify. It says "set the end of my data segment to this address". Of course, the address you specify must be reasonable, the operating system must have enough memory, and you can't make it point to somewhere that would otherwise exceed the process maximum data size. Thus, brk(0) is invalid, since you'd be trying to set the end of the data segment to address 0, which is nonsense.
On the other hand, sbrk increments the data segment size by the amount you specify, and returns a pointer to the previous break value. Calling sbrk with 0 is valid; it is a way to get a pointer to the current data segment break address.
malloc is not a system call, it's a C library function that manages memory using sbrk. According to the manpage, malloc(0) is valid, but not of much use:
If size is 0, then malloc() returns either NULL, or a unique pointer
value that can later be successfully passed to free().
So, no, brk(0), sbrk(0) and malloc(0) are not equivalent: the first of them is invalid, the second is used to obtain the address of the program's break, and the latter is useless.
Keep in mind that you should never use both malloc and brk or sbrk throughout your program. malloc assumes it's got full control of brk and sbrk, if you interchange calls to malloc and brk, very weird things can happen.
why malloc reduces the number of sbrk() system calls that the program
must perform?
say, if you call malloc() to request 10 bytes memory, the implementation may use sbrk (or other system call like mmap) to request 4K bytes from OS. Then when you call malloc() next time to request another 10 bytes, it doesn't have to issue system call; it may just return some memory allocated by system call of the last time 4K.
malloc() function is used to call the sbrk system call to create a memory dynamically during the process.
malloc() function is already assigned in stdlib.h header file so the as per the required function is recursively call by the malloc function using the library function.
with the help of sbrk we need to explicitly declare some thing to call the system call.
According to the size given in function or through system call it return to the variable and store.
sbrk() function increases the programs data segment allocation by specified bytes.
malloc(4096); // sbrk += 4096 Bytes
free(); // freeing memory will not bring down the sbrk by 4096 Bytes
malloc(4096); // malloc'ing again will not increase the sbrk and it will use
the existing space which not result in sbrk() call.
I know how to use clGetDeviceInfo to query information about the device but I don't know how to get information about the device at runtime. For example, how much global memory is in use right now? How busy have the processing elements been, on average, in the last n nanoseconds?
AFAIK, no. OpenCL itself does not have any API to query current status of a device. Those are exposed by the vendor of your particular implementation (like the GPUPerfAPI from AMD or the Graphics Performance analyzer from Intel).
Hope this helps.
What I did to be able to determine the free memory at runtime is write a wrapper around clDevice (or cl::Device in my case) and pipe all buffer allocations through said wrapper.
At the begin of the program, I query the total device memory (CL_DEVICE__GLOBAL_MEM_SIZE) and when buffers are allocated I store their addresses and sizes in a vector so I can subtract the accumulated size of the currently allocated buffers from the total memory.
With OpenCL, you can assign callback calls to the buffers, which are called when the buffer is destroyed (clSetMemObjectDestructorCallback). So I use those to clean up when the buffer is released. Hint: the cl_mem parameter with which the callback is called is NOT a valid mem object. It may have already been destroyed so you cannot query it for its size (that took me a couple of hours, even though it's clearly stated in the standard ...).
This way, I can always know, how much memory is left on the device.
In C/C++ we're used to checking for null pointers before dereferencing them, e.g.
int *p = malloc(sizeof(int));
if (p != 0)
{
/* Do something with the pointer */
}
Hence the memory manager can never return a pointer to the first memory address (where p == 0) as the calling program will assume that the memory could not be allocated.
Does that mean that the first byte or word (for alignment purposes) is always unused, both in the entire system memory space and the process' memory space? Or is this memory used by the system or kernel, which knows which null pointers it can dereference safely?
First of all, to make it clear, malloc returning 0 means signaling an error.
In most modern operating systems the virtual address space (addresses used in a program) is not the same as the physical address space (the real addresses that the memory understands). Most modern operating systems use paging. So the addresses used in a program (the address returned by malloc for example) aren't the same as the physical ones. The OS has some mechanism to make a correspondence between them.
The OS must simply never map anything at the physical address 0 for a regular process and that address will always be invalid if the process tries to access it. The OS itself, for its own benefit can access the memory at address 0 if it so desires.
Yes, sort of. Technically you could store something at address zero, but Windows doesn't allow - you get access violation and Linux doesn't allow - you get segmentation fault. This is done assist in having a designated special value that would mean "a null pointer - a pointer that clearly isn't pointing to any live object". Maybe there're systems where storing data at address zero is outright allowed, but still they would need some special value for a "null pointer".
In most systems, there is something already at physical address 0. Some older processors simply start execution there when they come out of reset. Others may expect a vector table there (addresses for reset and interrupts). Often a ROM will be mapped there to contain these special things the processor is expecting, so it doesn't make sense to ever get a pointer there.
With virtual memory, your app and all its allocations live in a virtual memory space which is mapped to physical memory. My guess would be that the program is mapped in at virtual address 0 and so you'd still never expect malloc to return 0. This is more of a guess on my part, as I'm not intimately familiar with the details of virtual memory layout.
BTW, I've often thought processors should return zero when reading from a NULL pointer without page faulting. This would allow speculative prefetch of data even when there isn't any.
I wrote a reasonably basic memory allocator using sbrk. I ask for a chunk of memory, say 65k and carve it up as needed for variables requesting dynamic memory. I free the memory by adding it back to the 65k block. The 65k block is derived from a union sizeof(16-bytes). Then I align the block along an even 16-byte boundary. But I'm getting unusual behavior.
Accessing the memory appears fine as I allocate and begin to populate my data structures accept that on one of my function calls, I pass a pointer to a member variable in a global structure but the address of the pointer argument doesn't map directly to the address of that member.
For example, the real address of this particular member happens to be: 0x100313d50 but when executing a particular function (nothing special) the address of the member is being represented as 0x100313d70. Inside the debugger I can query the real address and it appears correct when inside the function where this manifests. This isn't the first member being accessed either, it's the third so two prior memory accesses are fine, but during the third access I'm seeing this unusual shifting.
Is it possible that I'm accessing this memory via a misaligned block? It's possible but I'd expect the get a SIGBUS exception thrown (SPARC chip). I'm compiling using -memalign=16s so it ought to SIGBUS instead of trapping and fixing the misalignment.
All of my structures are padded on a multiple of 16-bytes: sizeof(structure)%16 = 0. Has anyone had experience with this type of behavior? Generally speaking, what type of things/stuff/etc. might cause a pointer to misrepresent a memory address?
Cheers,
Tracy.
Solaris 10, SunStudio-12, C language on modern SPARC processor (in case this helps).
I figure I should answer my own question in the event someone else out there has a similar problem.
The reason why the memory address was shifting is because a prior call to a utility function accidentally overwrote the meta-address of the global structure thusly rewriting the meta-address of that block so lookups on that block were shifted even though the actual data still resided in the original block.
In simple words, I wrote past my buffer. Since I hand out memory from the tail, overwriting would blow away my much needed meta-address for my global structure (or whatever). Now I know what undefined behavior looks like.