How to properly free a pointer-to-pointer in Cython?

How to properly free a pointer-to-pointer in Cython? - pointers

I would like to free memory allocated a variable that is a pointer-to-pointer (or a "double pointer" -- not to be confused with pointer to a double datatype.)
Suppose the pointer-to-pointer is **ptr and (for simplicity) it is a 10x10 array of integers. Following the documentation, I see that one way of allocating space is as follows.
from cpython.mem cimport PyMem_Malloc, PyMem_Free
## Allocate memory
cdef int **ptr = <int **>PyMem_Malloc(sizeof(int *) * 10)
cdef i;
for i in range(10):
ptr[i] = <int *>PyMem_Malloc(sizeof(int) * 10)
## Free memory
for i in range(10):
PyMem_Free(<void *>ptr[i])
PyMem_Free(<void *>ptr)
The documentation does not explicity mention what is the correct way to free a pointer-to-pointer. From my experience with programming in C, I guessed that this should be the recommended way. However, the free-memory syntax does not seem to work correctly, for when I compile (using distutils.core.setup), it seems to give me a familiar gcc-error -- indicating that I'm attempting to free unallocated memory.
python(4522,0x7fffd08143c0) malloc: *** error for object 0x1172c8054: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
Abort trap: 6
This question is very much the Cython analogue of this question pertaining to C pointers.

Related

cannot assign to `*x` because it is borrowed

My erroneous code snippet and compiler error info:
// code snippet 1:
0 fn main() {
1 let mut x: Box<i32> = Box::new(4);
2 let r: &Box<i32> = &x;
3 *x = 8;
4 println!("{}", r);
5 }
// compiler error info:
error[E0506]: cannot assign to `*x` because it is borrowed
--> src/main.rs:3:4
|
2 | let r = &x;
| -- borrow of `*x` occurs here
3 | *x = 8;
| ^^^^^^ assignment to borrowed `*x` occurs here
4 | println!("{}", r);
| - borrow later used here
For more information about this error, try `rustc --explain E0506`.
The following code won't compile, which makes quite senses to me cause we cannot invalidate the reference r .
// code snippet 2:
0 fn main() {
1 let mut x: i32 = 0;
2 let r: &i32 = &x;
3 x = 1;
4 println!("{}", r);
5 }
But the compiler error info of code snippet1 doesn't make too much sense to me.
x is a pointer on the stack pointing to a heap memory segment whose contents is 4 , reference r only borrows x (the pointer not the heap memory segment) , and in line 3 *x = 8; , what we did here is to alter the memory on the heap (not the pointer on the stack) . Change happens on the heap , while reference is only relevant to the stack, they do not interrelate.
This question is kind of picking a quarrel, but I do not mean to argue for the sake of argument.
If you found my question irregular, feel free to point it out :)

Change happens on the heap , while reference is only relevant to the stack, they do not interrelate.
That does not matter, because the type system doesn't work with that "depth" of information.
As far as it's concerned, borrowing x is borrowing the entirety of x up to any depth, and so any change anywhere inside x is forbidden.
For type checking purposes, this is no different than if x were a Box<Vec<_>>, and r were be actively used for iteration, leading any update to the inner vector to possibly invalidate the iterator.
(also type-wise *x = 8 does require first taking a unique reference to the box itself, before "upgrading" it to a unique reference to the box' content, as you can see from the trait implementation)

Rust's entire borrowing model enforces one simple requirement: the contents of a memory location can only be mutated if there is only one pointer through which that location can be accessed.
In your case, the heap location that you're trying to mutate can be accessed both through x and through r—and therefore mutation is denied.
This model enables the compiler to perform aggressive optimisations that permit, for example, the storage of values reachable through either alias in registers and/or caches without needing to fetch again from memory when the value is read.

The semantics of * is determined by two traits:
pub trait Deref {
type Target: ?Sized;
fn deref(&self) -> &Self::Target;
}
or
pub trait DerefMut: Deref {
fn deref_mut(&mut self) -> &mut Self::Target;
}
In your case, when you write *x = 8 Rust compiler expands the expression into the call
DerefMut::deref_mut(&mut x), because Box<T> implements Deref<Target=T> and DerefMut. That is why in the line *x = 8 mutable borrowing of x is performed, and by orphan rule it can't be done, because we've already borrowed x in let r: &Box<i32> = &x;.

I just found a great diagram from Programming Rust (Version 2), which really answers my question quite well:
In the case of my question, when x is shared-referenced by r, everything in the ownership tree of x (the stack pointer and the heap memory segment) becomes read-only.
I knew that the Stack Overflow community does not like pictures, but this diagram is really great and may help someone who will find this question in the future:)

Cython struct member scope

I have a data structure in Cython that uses a char * member.
What is happening is that the member value seems to lose its scope outside of a function that assigns a value to the member. See this example (using IPython):
[nav] In [26]: %%cython -f
...: ctypedef struct A:
...: char *s
...:
...: cdef char *_s
...:
...: cdef void fn(A *a, msg):
...: s = msg.encode()
...: a[0].s = s
...:
...: cdef A _a
...: _a.s = _s
...: fn(&_a, 'hello')
...: print(_a.s)
...: print(b'hola')
...: print(_a.s)
b'hello'
b'hola'
b"b'hola'"
It looks like _a.s is deallocated outside of fn and is being assigned any junk that is in memory that fits the slot.
This happens only under certain circumstances. For example, if I assign b'hello' to s instead of the encoded string inside fn(), the correct string is printed outside of the function.
As you can see, I also added an extra declaration for the char variable and assigned it to the struct before executing fn, to make sure that the _a.s pointer does not get out of scope. However, my suspect is that the problem is assigning the member to a variable that is in the function scope.
What is really happening here, and how do I resolve this issue?
Thanks.

Your problem is, that the pointer a.s becomes dangling in the fn-function as soon as it is created.
When calling msg.encode() the temporary byte-object s is created and the address of its buffer is saved to a.s. However, directly afterwards (i.e. at the exit from the function) the temporary bytes-object gets destroyed and the pointer becomes dangling.
Because the bytes object was small, Python's memory manager manages its memory in the arena - which guaranties that there is no segfault when you access the address (lucky you).
While the temporary object is destroyed, the memory isn't overwritten/sanatized and thus it looks as if the temporary object where still alive from A.s's point of view.
Whenever you create a new bytes-object similar in size to the temporary object, the old memory from the arena might get reused, so that your pointer a.s could point to the buffer of the newly allocated bytes-object.
Btw, would you use a[0].s = msg.encode() directly (and I guess you did), the Cython would not build and tell you, that you try to say reference to a temporary Python object. Adding an explicit reference fooled the Cython, but didn't help your case.
So what to do about it? Which solution is appropriate depends on the bigger picture, but the available strategies are:
Manage the memory of A.s. I.e. manually reserve memory, copy from the temporary object, free memory as soon as done.
Manage reference counting: Add a PyObject * to the A-struct. Assign the temporary object ot it (don't forget to increase the reference counter manually), decrease reference counter as soon as done.
Collect references of temporary objects into a pool (e.g. a list), which would keep them alive. Clear the pool as soon as objects aren't needed.
Not always the best, but easiest is the option 3 - you neither have to manage the memory not the reference counting:
%%cython
...
pool=[]
cdef void fn(A *a, msg):
s = msg.encode()
pool.append(s)
a[0].s = s
While this doesn't solve the principal problem, using PyUnicode_AsUTF8 (inspired by this answer) might be a satisfactory solution in this case:
%%cython
# it is not wrapped by `cpython.pxd`:
cdef extern from "Python.h":
const char* PyUnicode_AsUTF8(object unicode)
...
cdef void fn(A *a, msg):
a[0].s = PyUnicode_AsUTF8(msg) # msg.encode() uses utf-8 as default.
This has at least two advantages:
the pointer a[0].s is valid as long as msg is alive
calling PyUnicode_AsUTF8(msg) is faster than msg.encode(), because it reuses cached buffer, so it basically O(1) after first call, while msg.encode() needs at least copy the memory and is O(n) with n-number of characters.

I don't understand the result of this little program

i've made this little program to test a little part of a bigger program.
int main()
{
char c[]="ddddddddddddd";
char *g= malloc(4*sizeof(char));
*g=NULL;
strcpy (g,c);
printf("Hello world %s!\n",g);
return 0;
}
I expected that the function would return "Hello World dddd" ,since the length of g is 4*sizeof(char), but it returns " Hello World ddddddddddddd ".Can you explain me Where I'm wrong ?

Don't do that, it's undefined behaviour.
The strcpy function will happily copy all those characters in c regardless of the size of g.
That's because it copies characters up to the first \0 in c. In this particular case it may corrupt your heap, or it may not, depending on the minimum size of things that get allocated in the heap (many have a "resolution" of sixteen bytes for example).
There are other functions you can use (though they're optional) if you want your code to be more robust, such as strncpy (provided you understand the limitations), or strcpy_s(), as detailed in Appendix K of the ISO C11 standard (and earlier iterations as well).
Or, if you can't use those for some reason, it's up to the developer to ensure they don't break the rules.

Fortran 90 function return pointer

I saw this question:
Fortran dynamic objects
and the accepted answer made me question if I wrote the following function safely (without allowing a memory leak)
function getValues3D(this) result(vals3D)
implicit none
type(allBCs),intent(in) :: this
real(dpn),dimension(:,:,:),pointer :: vals3D
integer,dimension(3) :: s
if (this%TF3D) then
s = shape(this%vals3D)
if (associated(this%vals3D)) then
stop "possible memory leak - p was associated"
endif
allocate(vals3D(s(1),s(2),s(3)))
vals3D = this%vals3D
else; call propertyNotAssigned('vals3D','getValues3D')
endif
end function
This warning shows up when I run my code, but shouldn't my this%vals3D be associated if it was previously (to this function) set? I'm currently running into memory errors, and they started showing up when I introduced a new module with this function in it.
Any help is greatly appreciated.
I think I wasn't specific enough. I would like to make the following class, and know how to implement the class, safely in terms of memory. That is:
module vectorField_mod
use constants_mod
implicit none
type vecField1D
private
real(dpn),dimension(:),pointer :: x
logical :: TFx = .false.
end type
contains
subroutine setX(this,x)
implicit none
type(vecField1D),intent(inout) :: this
real(dpn),dimension(:),target :: x
allocate(this%x(size(x)))
this%x = x
this%TFx = .true.
end subroutine
function getX(this) result(res)
implicit none
real(dpn),dimension(:),pointer :: res
type(vecField1D),intent(in) :: this
nullify(res)
allocate(res(size(this%x)))
if (this%TFx) then
res = this%x
endif
end function
end module
Where the following code tests this module
program testVectorField
use constants_mod
use vectorField_mod
implicit none
integer,parameter :: Nx = 150
real(dpn),parameter :: x_0 = 0.0
real(dpn),parameter :: x_N = 1.0
real(dpn),parameter :: dx = (x_N - x_0)/dble(Nx-1)
real(dpn),dimension(Nx) :: x = (/(x_0+dble(i)*dx,i=0,Nx-1)/)
real(dpn),dimension(Nx) :: f
real(dpn),dimension(:),pointer :: fp
type(vecField1D) :: f1
integer :: i
do i=1,Nx
f(i) = sin(x(i))
enddo
do i=1,10**5
call setX(f1,f) !
f = getX(f1) ! Should I use this?
fp = getX(f1) ! Or this?
fp => getX(f1) ! Or even this?
enddo
end program
Currently, I'm running on windows. When I CTR-ALT-DLT, and view performance, the "physical memory usage histery" increases with every loop iteration. This is why I assume that I have a memory leak.
So I would like to repose my question: Is this a memory leak? (The memory increases with every one of the above cases). If so, is there a way I avoid the memory leak while still using pointers? If not, then what is happening, should I be concerned and is there a way to reduce the severity of this behavior?
Sorry for the initial vague question. I hope this is more to the point.

Are you really restricted to Fortran 90? In Fortran 2003 you would use an allocatable function result for this. This is much safer. Using pointer function results, whether you have a memory leak with this code or not depends on how you reference the function, which you don't show. If you must return a pointer from a procedure, it is much safer to return it via a subroutine argument.
BUT...
This function is pointless. There's no point testing the association status of this%vals3D` after you've referenced it as the argument to SHAPE in the previous line. If the pointer component is disassocated (or has undefined pointer association status), then you are not permitted to reference it.
Further, if the pointer component is associated, all you do is call stop!
Perhaps you have transcribed the code to the question incorrectly?
If you simply delete the entire if construct starting with if (associated(this%vals3D))... then your code may make sense.
BUT...
if this%TF3D is true, then this%vals3D must be associated.
when you reference the function, you must use pointer assignment
array_ptr => getValues3D(foo)
! ^
! |
! + this little character is very important.
Forget that little character and you are using normal assignment. Syntactically valid, difficult to pick the difference when reading code and, in this case, potentially a source of memory corruption or leaks that might go undetected until the worst possible moment, in addition to the usual pitfalls of using pointers (e.g. you need to DEALLOCATE array_ptr before you reuse it or it goes out of scope). This is why functions returning pointer results are considered risky.
Your complete code shows several memory leaks. Every time you allocate something that is a POINTER - you need to pretty much guarantee that there will be a matching DEALLOCATE.
You have a loop in your test code. ALLOCATE gets called a lot - in both the setter and the getter. Where are the matching DEALLOCATE statements?

Every time setX is called, any previously allocated memory for the x component of your type will be leaked. Since you call the function 10^5 times, you will waste 100000-1 copies. If you know that the size of this%x will never change, simply check to see if a previous call had already allocated the memory by checking to see if ASSOCIATED(this%x) is true. If it is, skip the allocation and move directly to the assignment statement. If the size does change, then you will first have to deallocate the old copy before allocating new space.
Two other minor comments on setX: The TARGET attribute of the dummy argument x appears superfluous since you never take a pointer of that argument. Second, the TFx component of your type also seems superfluous since you can instead check if x is allocated.
For the function getX, why not skip the allocation completely, and merely set res => this%x? Admittedly, this will return a direct reference to the underlying data, which maybe you want to avoid.
In your loop,
do i=1,10**5
call setX(f1,f) !
f = getX(f1) ! Should I use this?
fp = getX(f1) ! Or this?
fp => getX(f1) ! Or even this?
enddo
fp => getX(f1) will allow you to obtain a pointer to the underlying x component of your type (if you adopt my change above). The other two use assignment operators and will copy data from the result of getX into either f, or (if it is previously allocated) fp. If fp is not allocated, the code will crash.
If you do not want to grant direct access to the underlying data, then I suggest that the return value of getX should be defined as an automatic array with the size determined by this%x. That is, you can write the function as
function getX(this) result(res)
implicit none
type(vecField1D),intent(in) :: this
real(dpn),dimension(size(this%x,1)) :: res
res = this%x
end function

Fortran Unhandled Exception (msvcr100d.dll)

I'm getting this unhandled exception when I exit my program:
Unhandled exception at 0x102fe274 (msvcr100d.dll) in Parameters.exe: 0xC0000005: Access violation reading location 0x00000005.
The debugger stops in a module called crtdll.c on this line:
onexitbegin_new = (_PVFV *) DecodePointer(__onexitbegin);
The top line on the call stack reads:
msvcr100d.dll!__clean_type_info_names_internal(__type_info_node * p_type_info_root_node=0x04a6506c) Line 359 + 0x3 bytes C++
The program then remains in memory until I close down the IDE.
I'm more used to developing with managed languages so I expect I'm doing something wrong with my code maintenance. The code itself reads a memory mapped file and assoiciates it with pointers:
SUBROUTINE READ_MMF ()
USE IFWIN
USE, INTRINSIC :: iso_c_binding
USE, INTRINSIC :: iso_fortran_env
INTEGER(HANDLE) file_mapping_handle
INTEGER(LPVOID) memory_location
TYPE(C_PTR) memory_location_cptr
INTEGER memory_size
INTEGER (HANDLE) file_map
CHARACTER(5) :: map_name
TYPE(C_PTR) :: cdata
integer :: n = 3
integer(4), POINTER :: A, C
real(8), POINTER :: B
TYPE STRUCT
integer(4) :: A
real(8) :: B
integer(4) :: C
END TYPE STRUCT
TYPE(STRUCT), pointer :: STRUCT_PTR
memory_size = 100000
map_name = 'myMMF'
file_map = CreateFileMapping(INVALID_HANDLE_VALUE,
+ NULL,
+ PAGE_READWRITE,
+ 0,
+ memory_size,
+ map_name // C_NULL_CHAR )
memory_location = MapViewOfFile(file_map,
+IOR(FILE_MAP_WRITE, FILE_MAP_READ),
+ 0, 0, 0 )
cdata = TRANSFER(memory_location, memory_location_cptr)
call c_f_pointer(cdata, STRUCT_PTR, [n])
A => STRUCT_PTR%A
B => STRUCT_PTR%B
C => STRUCT_PTR%C
RETURN
END
Am I supposed to deallocate the c-pointers when I'm finished with them? I looked into that but can't see how I do it in Fortran...
Thanks for any help!

The nature of the access violation (during runtime library cleanup) suggests that your program is corrupting memory in some way. There are a number of programming errors that can lead to that - and the error or errors responsible could be anywhere in your program. The usual "compile and run with all diagnostic and debugging options enabled" approach may help identify these.
That said, there is a programming error in the code example shown. The C_F_POINTER procedure from the ISO_C_BINDING intrinsic module can operate on either scalar or array Fortran pointers (the second argument). If the Fortran pointer is a scalar then the third "shape" argument must not be present (it must be present if the Fortran pointer is an array).
Your code breaks this requirement - the Fortran pointer STRUCT_PTR in your code is a scalar, but yet you provide the third shape argument (as [n]). It is quite plausible that this error will result in memory corruption - typically the implementation of C_F_POINTER would try and populate a descriptor in memory for the Fortran pointer, and the descriptor for a pointer to an array may be very different from a pointer to a scalar.
Subsequent references to STRUCT_PTR may further the corruption.
While it is not required by the standard to diagnose this situation, I am a little surprised that the compiler does not issue a diagnostic (assuming you example code is what you actually are compiling). If you reported this to your compiler's vendor (Intel, presumably given IFWIN etc) I suspect they would regard it as a deficiency in their compiler.
To release the memory associated with the file mapping you use the UnmapViewOfFile and CloseHandle API's. To use these you should "store" (your program needs to remember in some way) the base address (memory_location, which can also be obtained by calling C_LOC on STRUCT_PTR once the problem above is fixed) returned by MapViewOfFile, and the handle to the mapping (file_map) returned by CreateFileMapping; respectively.

I've only ever done this with Cray Pointers: not with the ISO bindings and I know it does work with Cray Pointers.
What you don't say is whether this is happening the first time or second time the routine is being called. If it is called more than once, then there is a problem in the coding in that Create/OpenFileMapping should only be called once to get a handle.
You don't need to deallocate memory because the memory is not yours to deallocate: you need to call UnmapViewOfFile(memory_location). After you have called this, memory_location, memory_location_cptr and possibly cdata are no longer valid.
The way this works is with two or more programs:
One program calls CreateFileMapping, the others calls OpenFileMapping to obtain a handle to the data. This only needs to be called once at the start of the program: not every time you need to access the file. Multiple calls to Create/OpenFileMapping without a corresponding close can cause crashes.
They then call MapViewOfFile to map the file into memory. Note that only one program can do this at a time. When the program is finished with the memory file, it calls UnmapViewOfFile. The other program can now get to the file. There is a blocking mechanism. If you do not call UnmapViewOfFile, other programs using MapViewOfFile will be blocked.
When all is done, call close on the handle created by Create/OpenFileMapping.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex