Julia's garbage collection of ccall allocated data

Julia's garbage collection of ccall allocated data - julia

I was hoping someone could clarify one aspect of the behaviour of the Julia garbage collector and how it interacts with memory allocated by a call to a C function using ccall.
For example, I am making the following call:
setup::Ptr{Void} = ccall(("vDSP_DCT_CreateSetup", libacc), Ptr{Void},
(Ptr{Void}, UInt64, UInt64),
previous, length, dct_type)
This function allocates memory and initializes memory for a DFT_Setup object (the details of this are irrelevant). The library also provides a destructor to be called on the DFT_Setup to deallocate memory once the object is no longer needed.
Is calling the destructor necessary in Julia? i.e. Does the garbage collector handle freeing DFT_Setup when it is appropriate, or should I make a call to the C deallocator?

Yes, the Julia GC can only clean up the memory allocated by Julia itself, it has no knowledge of memory allocated by ccalls.
The usual way to solve this is to call the destructor from the finalizer, defined when in the constructor, e.g. see RCall.jl.

Related

Rust Global.dealloc vs ptr::drop_in_place vs ManuallyDrop

I'm relatively new to Rust. I was working on some lock-free algorithms, and started playing around with manually managing memory, something similar to C++ new/delete. I noticed a couple different ways that do this throughout the standard library components, but I want to really understand the differences and use cases of each. Here's what it seems like to me:
ManuallyDrop<Box<T>> will prevent Box's destructor from running. I can save a raw pointer to the ManuallyDrop element, and have the actual element go out of scope (what would normally be dropped in Rust) without being dropped. I can later call ManuallyDrop::drop(&mut *ptr) to drop this value manually.
I can also dereference the ManuallyDrop<Box<T>> element, save a raw pointer to just the Box<T>, and later call std::ptr::drop_in_place(box_ptr). This is supposed to destroy the Boxitself and drop the heap-allocated T.
Looking at the ManuallyDrop::drop implementation, it looks those are literally doing the exact same thing. Since ManuallyDrop is zero cost and just stores a value in it's struct, is there any difference in the above two approaches?
I can also call std::alloc::Global.dealloc(...), which looks like it will deallocate the memory block without calling drop. So if I call this on a pointer to Box<T>, it'll deallocate the heap pointer, but won't call drop, so T will still be lying around on the heap. I could call it on a pointer to T itself, which will remove T.
From exploring the standard library, it looks like Global.dealloc gets called in the raw_vec implementation to actually remove the heap-allocated array that Vec points to. This makes sense, since it's literally trying to remove a block of memory.
Rc has a drop implementation that looks roughly like this:
// destroy the contained object
ptr::drop_in_place(self.ptr.as_mut());
// remove the implicit "strong weak" pointer now that we've
// destroyed the contents.
self.dec_weak();
if self.weak() == 0 {
Global.dealloc(self.ptr.cast(), Layout::for_value(self.ptr.as_ref()));
}
I don't really understand why it needs both the dealloc and the drop_in_place. What does the dealloc add that the drop_in_place doesn't do?
Also, if I just save a raw pointer to a heap-allocated value by doing something like Box::new(5).into_raw(), does my pointer now control that memory allocation. As in, will it remain alive until I explicitly call ptr::drop_in_place()?
Finally, when I was playing with all this, I ran into a strange issue. After running ManuallyDrop::drop or ptr::drop_in_place on my raw pointer, I then tried running println! on the pointer's dereferenced value. Sometimes I get a scary heap error and my test fails, which is what I would expect. Other times, it just prints the same value, as if no drops happened. I also tried running ManuallyDrop::drop multiple times on the exact same value, and same thing. Sometimes a heap error, sometimes totally fine, and the same value prints out.
What is happening here?

If you come from C++, you can think of drop_in_place as calling the destructor manually, and dealloc as calling old C free.
They serve different purposes:
drop_in_place just calls Drop::drop, that releases the resources held by your type.
dealloc frees the memory pointed to by a pointer, previously allocated with alloc.
You seem to think that drop_in_place also frees the memory, but that is not the case. I think your confusion arises because Box<T> contains a dynamically allocated object, so its Box::drop implementation does release the memory used by that object, after calling its drop_in_place, of course.
That is what you see in the Rc implementation, first it calls the drop_in_place (destructor) of the inner object, then it releases the memory.
About what happens if you call drop_in_place several times in a row... well, the function is unsafe for a reason: you most likely get Uundefined Behavior. From the docs:
...if T is not Copy, using the pointed-to value after calling drop_in_place can cause undefined behavior.
Note the can cause. I think it is perfectly possible to write a type that allows calling drop several times, but it doesn't sound like such a good idea.

the suggested way to use clEnqueueMapBuffer and clEnqueueUnmapMemObject when implementing zero copy

I am playing deep learning with opencl, the output size of the tensor is fixed.
In cuda, I can use zero copy via cudaMallocHost, this can be called in the initialization. And I can read the output of the tensor from the host without explicitly calling cudaMemcpy.
It's very efficient since it's called only one time over the entire execution of my program. I don't need to call cudaMallocHost every time after forwarding.
And when I try to implement zero copy in opencl, in some implementations they call clEnqueueMapBuffer and clEnqueueUnmapMemObject every time after forwarding when you want to read the output of the tensor.
Here is the example (https://github.com/alibaba/MNN/blob/master/source/backend/opencl/core/OpenCLBackend.cpp#L291).
But I find that the overhead of clEnqueueMapBuffer can not be neglected, sometimes the latency is quite large.
Is this really suggested way to do so? Can I call clEnqueueMapBuffer only one time in the lifetime of my program and call clEnqueueUnmapMemObject one time when the end of my program? is there any issue to do so?

If your OpenCL implementation supports Shared Virtual Memory (introduced in 2.0), that feature allows you to do something similar, and much more.
For OpenCL 1.x, unless your OpenCL implementation makes any guarantees above and beyond the standard (which I'd expect it to do via an extension), you must unmap a buffer before a kernel gets write access to it, and likewise, you must not allow a kernel to read from it while it is mapped for writing.
This is explained in the clEnqueueMapBuffer specification:
Reads and writes by a kernel executing on a device to a memory region(s) mapped for writing are undefined.
The behavior of writes by a kernel executing on a device to a mapped region of a memory object is undefined.
In version 1.2, this was expanded, but the gist is the same:
If a memory object is currently mapped for writing, the application must ensure that the memory
object is unmapped before any enqueued kernels or commands that read from or write to this
memory object or any of its associated memory objects (sub-buffer or 1D image buffer objects)
or its parent object (if the memory object is a sub-buffer or 1D image buffer object) begin
execution; otherwise the behavior is undefined.
If a memory object is currently mapped for reading, the application must ensure that the memory
object is unmapped before any enqueued kernels or commands that write to this memory object
or any of its associated memory objects (sub-buffer or 1D image buffer objects) or its parent
object (if the memory object is a sub-buffer or 1D image buffer object) begin execution;
otherwise the behavior is undefined.
If you find that map/unmap has a high overhead, you are probably not hitting a zero-copy code path in your OpenCL implementation, and the driver is actually copying the memory contents. If in doubt, check with your implementation vendor to see how they recommend you implement zero-copy buffers in OpenCL. Zero-copy buffers are not guaranteed by the standard.

Store a pointer to lisp object in system area memory

I want to use Common Lisp to process something for a C program. But for some reasons I need use SBCL.
I wonder how to correctly store a pointer to lisp object in system area memory which is allocated by a C function. For example,
struct c_struct {
...
lispobj *obj;
...
};
With sb-kernel:get-lisp-obj-address, I can get the pointer to a lisp object. But it makes no sence to store it in foreign memory. The main problem is that GC moves objects. sb-sys:with-pinned-object only pins objects during the extent of the body and it's obviously a bad idea to pin a object for a long time. So I need some methods to tell GC to update the pointer when the pointed object is moved.

While I don't believe (although I'm eager to be corrected) that SBCL allows one to "pin" the pointer-address of an object for a very long time, nor is the garbage collector easily extensible to updating “foreign copies” of pointers to objects, you can obtain a persistent pointer to a Lisp callback function; i.e. a pointer which a C program can funcall which is actually a Lisp function, using defcallback.
One (untested) theory might be to wrap your C language calls in such a way:
C function allocates c_struct with a NULL pointer slot
You provide a (defcallback …) function pointer to your C program;
let's call it void with_locked_access ((void*) interior_function(struct c_struct *), struct c_struct *struct_to_use)
when C function(s) want to access this pointer, they call interior_function with their own function-pointer interior_function and the pointer to the c_struct that interests them
interior_function is actually (defcallback call-c-with-pinned-object…); it, in turn, calls sb-sys:with-pinned-object and obtains the system-area pointer for the object, and stores it into c_struct before calling interior_function with the (now-populated) structure as its parameter.
interior_function does whatever it is that it wants the pinned Lisp object for; it returns, and call-c-with-pinned-object closes out the with-pinned-object form and returns, itself.
Naturally, this depends entirely upon what it is you want to do in your C code, and whether it's going to be running in parallel with Lisp code that might be negatively impacted by the pinning, &c &c.
Alternatively, in the special (but common) case that the object in question happens to be a byte vector (e.g. perhaps a buffer of some kind), you might be able to take advantage of cffi-sys:make-shareable-byte-vector and cffi-sys:with-pointer-to-vector-data, q.v.

what happens when you invoke malloc() on a unix system

malloc() library function internally calls brk() or sbrk() system call,which allocates memory fo data region,so local static variables and global variables will have allocation of memory from heap increasing the effective size of data region.now my question is what exactly is happening when i allocate memory to int *a?which is local variable.
i might have misconception please let me know if any.thanks

int *p itself is a local variable, which is a pointer (these days: usually four or eight bytes, usually on the stack or in a register). When you do p = malloc(...), you are allocating memory (on the heap - or what is these days conventionally called 'the heap' even if a heap is not the structure used to manage free memory) and assigning a pointer to that memory into p.

When you call malloc() you get access to the amount of memory requested, or NULL is returned. That is all that is guaranteed. Everything else is implementation dependent. The mechanism by which you get access to that memory can be quite varied.

Garbage collection vs. shared pointers

What are the differences between shared pointers (such as boost::shared_ptr or the new std::shared_ptr) and garbage collection methods (such as those implemented in Java or C#)? The way I understand it, shared pointers keep track of how many times variables points to the resource and will automatically destruct the resource when the count reaches zero. However, my understanding is that the garbage collector also manages memory resources, but requires additional resources to determine if an object is still being referred to and doesn't necessarily destruct the resource immediately.
Am I correct in my assumptions, and are there any other differences between using garbage collectors and shared pointers? Also, why would anyone ever used a garbage collector over a shared pointer if they perform similar tasks but with varying performance figures?

The main difference lies, as you noted, in when the resource is released/destroyed.
One advantage where a GC might come in handy is if you have resources that take a long time to be released. For a short program lifetime, it might be nice to leave the resources dangling and have them cleaned up in the end. If resource limits are reached, then the GC can act to release some of them. Shared pointers, on the other hand, release their resources as soon as the reference count hits zero. This could be costly for frequent acquisition-release cycles of a resource with costly time requirements.
On the other hand, in some garbage collection implementations, garbage collection requires that the whole program pause its execution while memory is examined, moved around, and freed. There are smarter implementations, but none are perfect.

Those Shared Pointers (usually called reference counting) run the risk of cycles.
Garbage collection (Mark and Sweep) does not have this problem.

In a simple garbage-collected system, nobody will hold a direct pointer to any object; instead, code will hold references to table entries which point to objects on the heap. Each object on the heap will store its size (meaning all heap objects will form a singly-linked list) and a back-reference to the object in the object table which holds it (or at least used to).
When either the heap or the object table gets full, the system will set a "delete me" flag on every object in the table. It will examine every object it knows about and, if its "delete flag" was set, unset it and add all the objects it knows about to the list of objects to be examined. Once that is done, any object whose "delete me" flag is still set can be deleted.
Once that is done, the system will start at the beginning of the heap, take each object stored there, and see if its object reference still points to it. If so, it will copy that object to the beginning of the heap, or just past the end of the last copied object; otherwise the object will be skipped (and will likely be overwritten when other objects are copied).

In languages with a garbage collector (GC), the GC keeps track of and cleans up memory that isn’t being used anymore, and we don’t need to think about it. In most languages without a GC, it’s our responsibility to identify when memory is no longer being used and to call code to explicitly free it, just as we did to request it.
more details: HERE

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Julia's garbage collection of ccall allocated data - julia

Yes, the Julia GC can only clean up the memory allocated by Julia itself, it has no knowledge of memory allocated by ccalls. The usual way to solve this is to call the destructor from the finalizer, defined when in the constructor, e.g. see RCall.jl.

Related

Rust Global.dealloc vs ptr::drop_in_place vs ManuallyDrop

the suggested way to use clEnqueueMapBuffer and clEnqueueUnmapMemObject when implementing zero copy

Store a pointer to lisp object in system area memory

what happens when you invoke malloc() on a unix system

Garbage collection vs. shared pointers

Categories

Resources