Store a pointer to lisp object in system area memory - common-lisp

I want to use Common Lisp to process something for a C program. But for some reasons I need use SBCL.
I wonder how to correctly store a pointer to lisp object in system area memory which is allocated by a C function. For example,
struct c_struct {
...
lispobj *obj;
...
};
With sb-kernel:get-lisp-obj-address, I can get the pointer to a lisp object. But it makes no sence to store it in foreign memory. The main problem is that GC moves objects. sb-sys:with-pinned-object only pins objects during the extent of the body and it's obviously a bad idea to pin a object for a long time. So I need some methods to tell GC to update the pointer when the pointed object is moved.

While I don't believe (although I'm eager to be corrected) that SBCL allows one to "pin" the pointer-address of an object for a very long time, nor is the garbage collector easily extensible to updating “foreign copies” of pointers to objects, you can obtain a persistent pointer to a Lisp callback function; i.e. a pointer which a C program can funcall which is actually a Lisp function, using defcallback.
One (untested) theory might be to wrap your C language calls in such a way:
C function allocates c_struct with a NULL pointer slot
You provide a (defcallback …) function pointer to your C program;
let's call it void with_locked_access ((void*) interior_function(struct c_struct *), struct c_struct *struct_to_use)
when C function(s) want to access this pointer, they call interior_function with their own function-pointer interior_function and the pointer to the c_struct that interests them
interior_function is actually (defcallback call-c-with-pinned-object…); it, in turn, calls sb-sys:with-pinned-object and obtains the system-area pointer for the object, and stores it into c_struct before calling interior_function with the (now-populated) structure as its parameter.
interior_function does whatever it is that it wants the pinned Lisp object for; it returns, and call-c-with-pinned-object closes out the with-pinned-object form and returns, itself.
Naturally, this depends entirely upon what it is you want to do in your C code, and whether it's going to be running in parallel with Lisp code that might be negatively impacted by the pinning, &c &c.
Alternatively, in the special (but common) case that the object in question happens to be a byte vector (e.g. perhaps a buffer of some kind), you might be able to take advantage of cffi-sys:make-shareable-byte-vector and cffi-sys:with-pointer-to-vector-data, q.v.

Related

Is every variable and register name just a pointer in NASM Assembly?

There are [] operations which are similar to dereferencing in high-level languages. So does that mean that every variable and register name is just a pointer or are pointers a high-level languages idea and have no use in Assembly?
Pointers are a useful concept in asm, but I wouldn't say that symbol names are truly pointers. They're addresses, but they aren't pointers because there's no storage holding them (except metadata, and embedded into the machine code), and thus you can't modify them. A pointer can be incremented.
Variables are a high-level concept that doesn't truly exist in assembly. Asm has labels which you can put at any byte position in any section, including .data, .text, or whatever. Along with directives like dd, you can reserve space for a global variable and attach a symbol to it. (But a variable's value can temporarily be in a register, or for its whole lifetime if it's a local variable. The high-level concept of a variable doesn't have to map to static storage with a label.)
Types like integer vs. pointer also don't really exist in assembly; everything is just bytes that you can load into an integer, XMM, or even x87 FP register. (And you don't really have to think of that as type-punning to integer and back if you use eax to copy the bytes of a float from one memory location to another, you're just loading and storing to copy bytes around.)
But on the other hand, a pointer is a low-enough level concept still be highly relevant in assembly. We have the stack pointer (RSP) which usually holds a valid address, pointing to some memory we're using as stack space. (You can use the RSP register to hold values that aren't valid addresses, though. In that case you're not using it as a pointer. But at any time you could execute a push instruction, or mov eax, [rsp], and cause an exception from the invalid address.)
A pointer is an object that holds the address of another object. (I'm using "object" in C terms here: any byte[s] of storage that you can access, including something like an int. Not objected as in object-oriented programming.) So a pointer is basically a type of integer data, especially in assembly for a flat memory model where there aren't multiple components to it. For a segmented memory model, a seg:off far pointer is a pair of integers.
So any valid address stored anywhere in register or memory can usefully be thought of as a pointer.
But no, a symbol defined by a label is not a pointer. Conceptually, I think it's important to think of it as just a label. A pointer is itself an object (some bytes of storage), and thus can be modified. e.g. increment a pointer. But a symbol is just a way to reference some fixed position.
In C terms, a symbol is like a char symbol[], not a char *symbol = NULL; If you use bare symbol, you get the address. Like mov edi, symbol in NASM syntax. (Or mov edi, OFFSET symbol in GNU .intel_syntax or MASM. See also How to load address of function or label into register for practical considerations like using RIP-relative LEA if 32-bit absolute addresses don't work.)
You can deref any symbol in asm to access the bytes there, whether that's mov eax, [main] to load the first 4 bytes of machine code of that function, or mov eax, [global_arr + rdi*8] to index into an array, or any other x86 addressing mode. (Caveat: 32-bit absolute addresses no longer allowed in x86-64 Linux? for that last example).
But you can't do arr++; that makes no sense. There is no storage anywhere holding that address. It's embedded into the machine code of your program at each site that uses it. It's not a pointer. (Note that C arr[4] = 1 compiles to different asm depending on char *arr; vs. char arr[], but in asm you have to manually load the pointer value from wherever it's stored, and then deref it with some offset.)
If you have a label in asm whose address you want to use in C, but that isn't attached to some bytes of storage, you usually want to declare it as extern const char end_of_data_section[]; or whatever.
So for example you can do size_t data_size = data_end - data_start; and get the size in bytes as a link-time constant, if you arranged for those symbols to be at the end/start of your .data section. With a linker script or with global data_end / data_end: in your NASM source. Probably at the same address as some other symbol, for the start.
Assembly language doesn't have variables, pointers or type checking. It has addresses and registers (and no type checking).
A variable (e.g. in C) is a higher level thing - a way to abstract the location of a piece of data so that you don't have to care if the compiler felt like putting it in a register or in memory. A variable also has a type; which is used to detect some kinds of bugs, and used to allow the compiler to automatically convert data to a different type for you.
A pointer (e.g. in C) is a variable (see above). The main difference between a pointer and a "not pointer" is the type of data it contains - for a pointer the variable typically contains an address, but a pointer is not just an address, it's the address of something with a type. This is important - if it was just an address then the compiler wouldn't know how big it is, couldn't detect some kinds of bugs, and couldn't automatically convert data (e.g. consider what would happen for int foo = *pointer_to_char; if the compiler didn't know what type of data the pointer points to).

Rust Global.dealloc vs ptr::drop_in_place vs ManuallyDrop

I'm relatively new to Rust. I was working on some lock-free algorithms, and started playing around with manually managing memory, something similar to C++ new/delete. I noticed a couple different ways that do this throughout the standard library components, but I want to really understand the differences and use cases of each. Here's what it seems like to me:
ManuallyDrop<Box<T>> will prevent Box's destructor from running. I can save a raw pointer to the ManuallyDrop element, and have the actual element go out of scope (what would normally be dropped in Rust) without being dropped. I can later call ManuallyDrop::drop(&mut *ptr) to drop this value manually.
I can also dereference the ManuallyDrop<Box<T>> element, save a raw pointer to just the Box<T>, and later call std::ptr::drop_in_place(box_ptr). This is supposed to destroy the Boxitself and drop the heap-allocated T.
Looking at the ManuallyDrop::drop implementation, it looks those are literally doing the exact same thing. Since ManuallyDrop is zero cost and just stores a value in it's struct, is there any difference in the above two approaches?
I can also call std::alloc::Global.dealloc(...), which looks like it will deallocate the memory block without calling drop. So if I call this on a pointer to Box<T>, it'll deallocate the heap pointer, but won't call drop, so T will still be lying around on the heap. I could call it on a pointer to T itself, which will remove T.
From exploring the standard library, it looks like Global.dealloc gets called in the raw_vec implementation to actually remove the heap-allocated array that Vec points to. This makes sense, since it's literally trying to remove a block of memory.
Rc has a drop implementation that looks roughly like this:
// destroy the contained object
ptr::drop_in_place(self.ptr.as_mut());
// remove the implicit "strong weak" pointer now that we've
// destroyed the contents.
self.dec_weak();
if self.weak() == 0 {
Global.dealloc(self.ptr.cast(), Layout::for_value(self.ptr.as_ref()));
}
I don't really understand why it needs both the dealloc and the drop_in_place. What does the dealloc add that the drop_in_place doesn't do?
Also, if I just save a raw pointer to a heap-allocated value by doing something like Box::new(5).into_raw(), does my pointer now control that memory allocation. As in, will it remain alive until I explicitly call ptr::drop_in_place()?
Finally, when I was playing with all this, I ran into a strange issue. After running ManuallyDrop::drop or ptr::drop_in_place on my raw pointer, I then tried running println! on the pointer's dereferenced value. Sometimes I get a scary heap error and my test fails, which is what I would expect. Other times, it just prints the same value, as if no drops happened. I also tried running ManuallyDrop::drop multiple times on the exact same value, and same thing. Sometimes a heap error, sometimes totally fine, and the same value prints out.
What is happening here?
If you come from C++, you can think of drop_in_place as calling the destructor manually, and dealloc as calling old C free.
They serve different purposes:
drop_in_place just calls Drop::drop, that releases the resources held by your type.
dealloc frees the memory pointed to by a pointer, previously allocated with alloc.
You seem to think that drop_in_place also frees the memory, but that is not the case. I think your confusion arises because Box<T> contains a dynamically allocated object, so its Box::drop implementation does release the memory used by that object, after calling its drop_in_place, of course.
That is what you see in the Rc implementation, first it calls the drop_in_place (destructor) of the inner object, then it releases the memory.
About what happens if you call drop_in_place several times in a row... well, the function is unsafe for a reason: you most likely get Uundefined Behavior. From the docs:
...if T is not Copy, using the pointed-to value after calling drop_in_place can cause undefined behavior.
Note the can cause. I think it is perfectly possible to write a type that allows calling drop several times, but it doesn't sound like such a good idea.

Julia's garbage collection of ccall allocated data

I was hoping someone could clarify one aspect of the behaviour of the Julia garbage collector and how it interacts with memory allocated by a call to a C function using ccall.
For example, I am making the following call:
setup::Ptr{Void} = ccall(("vDSP_DCT_CreateSetup", libacc), Ptr{Void},
(Ptr{Void}, UInt64, UInt64),
previous, length, dct_type)
This function allocates memory and initializes memory for a DFT_Setup object (the details of this are irrelevant). The library also provides a destructor to be called on the DFT_Setup to deallocate memory once the object is no longer needed.
Is calling the destructor necessary in Julia? i.e. Does the garbage collector handle freeing DFT_Setup when it is appropriate, or should I make a call to the C deallocator?
Yes, the Julia GC can only clean up the memory allocated by Julia itself, it has no knowledge of memory allocated by ccalls.
The usual way to solve this is to call the destructor from the finalizer, defined when in the constructor, e.g. see RCall.jl.

How can I create an owning pointer to an unsized type?

Dealing with values of type str in Rust is clumsy because they do not implement the trait Sized. Therefore, they can only be accessed by pointer.
For my application, using ordinary pointers with lifetimes is not very helpful. Rather, I want an owning fat pointer that guarantees that the contained object will last as long as the pointer does (and no longer), but allows holding values of unknown size.
Box<T> works for an unsized T; thus Box<str>, Box<[T]> and so forth. The important distinction to note between Box<str> and String is that the latter has a capacity member as well, increasing its memory usage by one word but allowing for efficient appending as it may not need to reallocate for every push, whereas a similar method on a Box<str> would need to. The same is true of Box<[T]> versus Vec<T>, with the former being a fixed-size slice while the latter is conveniently growable. Unlike Box<str>, Box<[T]> is actually used in real life; the vec! macro uses it for efficiency, as a Box<[T]> can be written out literally and then converted to a Vec<T> at no cost.

In Fortran, what is the most memory conservative way to have an instance variable that is seldom initialized?

I need to design an object that has an instance variable of type REAL that may or may not contain a value, that is, it may be undefined in some instances.
TYPE :: object
REAL :: result_of_some_calculation
END TYPE object
Since this instance variable may not always be defined, I wonder if there is a prudent way to avoid consuming memory for this instance variable until it is initialized. That is, I could do the following:
TYPE :: object
REAL, POINTER :: result_of_some_calculation => NULL()
CONTAINS
PROCEDURE :: get_calculation_result
END TYPE
SUBROUTINE get_calculation_result(self)
IMPLICIT NONE
CLASS(object) :: self
REAL, TARGET :: result
result = some_function()
self%result_of_some_calculation => result
END SUBROUTINE get_calculation_result
So, would the pointer when its nullified use less memory than when it points to a REAL?
I understand that in this case, I have the memory cost of the REAL and the memory cost of the POINTER, but I'm hoping that in the more common case when this pointer is always left nullified, I use less memory than if I used a REAL in my derived data type and left this REAL undefined.
Alternatively, I could create an allocatable array of one element:
TYPE :: object
REAL, DIMENSION(:), ALLOCATABLE :: result_of_some_calculation
CONTAINS
PROCEDURE :: get_calculation_result
END TYPE
SUBROUTINE get_calculation_result(self)
IMPLICIT NONE
CLASS(object) :: self
REAL, TARGET :: result
result = some_function()
ALLOCATE(self%result_of_some_calculation(1))
self%result_of_some_calculation(1) = result
END SUBROUTINE get_calculation_result
Would this use less memory?
In short, my question is, what are the memory costs of a nullified pointer, versus an unallocated array, versus a real (which I know to be 4 bytes lets say)?
This is implementation specific, but a pointer to a non-polymorphic scalar object (or an allocatable for such an object) is typically implemented by a machine level pointer. If your code is compiled for and running on a platform with 32 bit (4 byte) machine level pointers, then this approach results in no memory saving for an unused object, and double the memory consumption when the object is in use, on top of any performance impacts of the indirect reference (and the aliasing potential for the pointer case).
If you are compiling for and running on a platform with 64 bit pointers, the machine level pointer is twice the size of the data being pointed at.
If you are dealing with polymorphic objects, there typically is another machine level pointer involved, to describe the dynamic type of the object being pointed at/allocated.
Implementations may have other fields in the descriptor for a scalar object.
Even more implementation specific, but a pointer to or allocatable for a non-polymorphic array object will require, in addition to the machine level pointer to the actual data, information about the bounds (or bound and extent) of the object - say 12 bytes minimum on a 32 bit platform for a rank one array. In the case of a pointer you also need to store a stride - another four bytes. Implementations typically will have additional fields for various flags and other conveniences and often use the same data structure for pointers and allocatable objects (and also assumed shape objects and polymorphic and non-polymorphic objects). As an example, one 32 bit platform I am familiar with has a 36 byte descriptor for a rank one array - nine times the storage of the fundamentally scalar thing that the descriptor might be pointing at in your use case. Eighteen times if you are on a 64 bit platform.
These are not the memory savings you are looking for.
(Note that associating a pointer with a non-saved local object of a procedure is pointless - the pointer becomes undefined when the procedure terminates.)

Resources