I have a data structure in Cython that uses a char * member.
What is happening is that the member value seems to lose its scope outside of a function that assigns a value to the member. See this example (using IPython):
[nav] In [26]: %%cython -f
...: ctypedef struct A:
...: char *s
...:
...: cdef char *_s
...:
...: cdef void fn(A *a, msg):
...: s = msg.encode()
...: a[0].s = s
...:
...: cdef A _a
...: _a.s = _s
...: fn(&_a, 'hello')
...: print(_a.s)
...: print(b'hola')
...: print(_a.s)
b'hello'
b'hola'
b"b'hola'"
It looks like _a.s is deallocated outside of fn and is being assigned any junk that is in memory that fits the slot.
This happens only under certain circumstances. For example, if I assign b'hello' to s instead of the encoded string inside fn(), the correct string is printed outside of the function.
As you can see, I also added an extra declaration for the char variable and assigned it to the struct before executing fn, to make sure that the _a.s pointer does not get out of scope. However, my suspect is that the problem is assigning the member to a variable that is in the function scope.
What is really happening here, and how do I resolve this issue?
Thanks.
Your problem is, that the pointer a.s becomes dangling in the fn-function as soon as it is created.
When calling msg.encode() the temporary byte-object s is created and the address of its buffer is saved to a.s. However, directly afterwards (i.e. at the exit from the function) the temporary bytes-object gets destroyed and the pointer becomes dangling.
Because the bytes object was small, Python's memory manager manages its memory in the arena - which guaranties that there is no segfault when you access the address (lucky you).
While the temporary object is destroyed, the memory isn't overwritten/sanatized and thus it looks as if the temporary object where still alive from A.s's point of view.
Whenever you create a new bytes-object similar in size to the temporary object, the old memory from the arena might get reused, so that your pointer a.s could point to the buffer of the newly allocated bytes-object.
Btw, would you use a[0].s = msg.encode() directly (and I guess you did), the Cython would not build and tell you, that you try to say reference to a temporary Python object. Adding an explicit reference fooled the Cython, but didn't help your case.
So what to do about it? Which solution is appropriate depends on the bigger picture, but the available strategies are:
Manage the memory of A.s. I.e. manually reserve memory, copy from the temporary object, free memory as soon as done.
Manage reference counting: Add a PyObject * to the A-struct. Assign the temporary object ot it (don't forget to increase the reference counter manually), decrease reference counter as soon as done.
Collect references of temporary objects into a pool (e.g. a list), which would keep them alive. Clear the pool as soon as objects aren't needed.
Not always the best, but easiest is the option 3 - you neither have to manage the memory not the reference counting:
%%cython
...
pool=[]
cdef void fn(A *a, msg):
s = msg.encode()
pool.append(s)
a[0].s = s
While this doesn't solve the principal problem, using PyUnicode_AsUTF8 (inspired by this answer) might be a satisfactory solution in this case:
%%cython
# it is not wrapped by `cpython.pxd`:
cdef extern from "Python.h":
const char* PyUnicode_AsUTF8(object unicode)
...
cdef void fn(A *a, msg):
a[0].s = PyUnicode_AsUTF8(msg) # msg.encode() uses utf-8 as default.
This has at least two advantages:
the pointer a[0].s is valid as long as msg is alive
calling PyUnicode_AsUTF8(msg) is faster than msg.encode(), because it reuses cached buffer, so it basically O(1) after first call, while msg.encode() needs at least copy the memory and is O(n) with n-number of characters.
Related
So I have a struct:
typedef struct {
int x = 0;
} Command;
and global vectors:
vector<Command> cmdList = {}; vector<Event*> eventList = {};
I push_back, erase and clear the vector in another .cpp file. This gets pushed back into:
vector<Command> cmdsToExec = {}; inside per Event struct created. I use this to push_back:
eventList.push_back( new Event() ); eventList[int( eventList.size() ) - 1]->cmdsToExec = cmdList;
My problem A) these Event*s can't be erased with delete and B) is that Valgrind gives this error while trying to determine the size of the cmdsToExec:
==25096== Invalid read of size 8
==25096== at 0x113372: std::vector<Command, std::allocator<Command> >::size() const (stl_vector.h:919)
==25096== by 0x11C1C7: eventHandler::processEvent() (eventHandler.cpp:131)
==25096== by 0x124590: main (main.cpp:88)
==25096== Address 0x630a9e0 is 32 bytes inside a block of size 56 free'd
==25096== at 0x484BB6F: operator delete(void*, unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==25096== by 0x11C116: eventHandler::processEvent() (eventHandler.cpp:222)
==25096== by 0x124590: main (main.cpp:88)
==25096== Block was alloc'd at
==25096== at 0x4849013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==25096== by 0x11B4A5: eventHandler::createEvent() (eventHandler.cpp:58)
==25096== by 0x11B412: eventHandler::doState() (eventHandler.cpp:41)
==25096== by 0x124575: main (main.cpp:83)
Ive tracked it to the line:
while( int( eventList[0]->cmdsToExec.size() ) > 0 ) {
Im not trying to solve this specific problem, its more about how to properly delete and unallocate a dynamic pointer from a global vector of dynamic pointers. That being said there are no objects (and I want to keep it that way). Will I need a struct deconstructor (no pun intended)? Also I dont believe cmdList vector ever has a memory leak according to this error message, also as Im clearing it all at once.
My thoughts on fixing this are to place both global vectors into my main() function and pass them into the program from there. I thought it would be unnecessary to do this and would slow the program down. Thinking now, I guess it wouldn't.
My guess is that this is a problem related to the order of destruction of static/global objects.
C++ guarantees that for a given translation unit (i.e., a cpp source file) then statics/global objects get created in the order that they are defined, and they are destroyed in the reverse order.
C++ gives no guarantee between different translation units.
My recommendations are:
Avoid statics/globals. Move them to be class members if possible.
If you have any dependencies between statics/globals then put them all in the same source file so that you have control over the order of their creation and destruction.
This question continues story of this one
For some reason I'd like to manually manage memory in my program. So I want to allocate memory, work with it, and free it. The first and the last are quite easy. However there are some troubles with pointer dereferencing.
Suupose I have such a struct
mutable struct Point
x::Float64
y::Float64
end
I'd like to allocate some memory and store my Point there. In fact I could just say point = Point(0.,0.) but this prevents me from manually freeing up allocated memory and lets me rely on gc. So I write
ptr = convert(Ptr{Point}, Base.Libc.malloc(sizeof(Point))) # N.B. sizeof(Point) == 16
And at this point I want to write into memory pointer by ptr. I found some hacky ways to do it.
Use unsafe_store!. This method is not good for me, because I have to allocate memory for object to be written (Point(0.,0.)) and only after that copy it into ptr.
unsafe_store!(ptr, Point(0.,0.))
Use unsafe_wrap. This could be a good idea however I cannot just wrap a piece of memory into array of mutable types. That's because... array of mutable structs contains only sequential pointers to this structs, not the structs themselves. So because Point has two fields of type Int, I can wrap with type Tuple{Float64,Float64} and then assign values inside of wrapped array. But I also have to store (0.,0.) somewhere, and it's also just copying. And of course I cannot assign to fields of immutable type, so on the one hand I cannot wrap around mutable type, on the other, I cannot modify immutable (and on the third hand I cannot reinterpret them into each other)
ptr2 = convert(Ptr{Tuple{Float64,Float64}}, ptr)
arr = unsafe_wrap(Array, ptr2, 1)
arr[1] = (0.,0.)
Finally I can brutally convert everything to primitive types and this really allows avoid additional allocating and copying. And at this point it's time to ask myself a question, what for do I need the declaration of type Point if I'm working with it internals, guts and bones, directly. (Means I wanted something like p.x = p.y = 0.)
ptr3 = convert(Ptr{Int}, ptr)
arr = unsafe_wrap(Array, ptr3, 2)
arr[1] = arr[2] = 0.
The least hackish and the least working (not working at all) method. I found such a good function, called unsafe_pointer_to_objref and at first sight it should do all the bussines. But. It doesn't. It works quite strange. As it is said in docs (source code)
unsafe_pointer_to_objref(p::Ptr)
Convert a Ptr to an
object reference. Assumes the pointer refers to a valid heap-allocated
Julia object. If this is not the case, undefined behavior results,
hence this function is considered "unsafe" and should be used with
care.
So I have a heap-allocated Point, with ptr pointing on it, I can even unsafe_load value of it (unsafe_load makes a copy, so it is not the solution), and I should be able to make something like
point = unsafe_pointer_to_objref(ptr)
But this leads to
Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x278d238 -- typekeyvalue_hash at /cygdrive/c/buildbot/worker/package_win64/build/src\jltypes.c:1160 [inlined]
lookup_typevalue at /cygdrive/c/buildbot/worker/package_win64/build/src\jltypes.c:722
in expression starting at none:0
typekeyvalue_hash at /cygdrive/c/buildbot/worker/package_win64/build/src\jltypes.c:1160 [inlined]
lookup_typevalue at /cygdrive/c/buildbot/worker/package_win64/build/src\jltypes.c:722
jl_inst_arg_tuple_type at /cygdrive/c/buildbot/worker/package_win64/build/src\jltypes.c:1589
jl_f_tuple at /cygdrive/c/buildbot/worker/package_win64/build/src\builtins.c:786 [inlined]
jl_f_tuple at /cygdrive/c/buildbot/worker/package_win64/build/src\builtins.c:781
indexed_iterate at .\pair.jl:37 [inlined]
indexed_iterate at .\pair.jl:37
jfptr_indexed_iterate_29996.clone_1 at C:\Users\user\AppData\Local\Programs\Julia-1.7.0\lib\julia\sys.dll (unknown line)
print_response at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\REPL\src\REPL.jl:281
#45 at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\REPL\src\REPL.jl:275
jfptr_YY.45_50669.clone_1 at C:\Users\user\AppData\Local\Programs\Julia-1.7.0\lib\julia\sys.dll (unknown line)
with_repl_linfo at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\REPL\src\REPL.jl:508
print_response at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\REPL\src\REPL.jl:273
do_respond at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\REPL\src\REPL.jl:844
jfptr_do_respond_48815.clone_1 at C:\Users\user\AppData\Local\Programs\Julia-1.7.0\lib\julia\sys.dll (unknown line)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1788 [inlined]
jl_f__call_latest at /cygdrive/c/buildbot/worker/package_win64/build/src\builtins.c:757
#invokelatest#2 at .\essentials.jl:716 [inlined]
invokelatest at .\essentials.jl:714 [inlined] run_interface at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\REPL\src\LineEdit.jl:2493
jfptr_run_interface_49569.clone_1 at C:\Users\user\AppData\Local\Programs\Julia-1.7.0\lib\julia\sys.dll (unknown line)
run_frontend at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\REPL\src\REPL.jl:1230
#49 at .\task.jl:423
jfptr_YY.49_48790.clone_1 at C:\Users\user\AppData\Local\Programs\Julia-1.7.0\lib\julia\sys.dll (unknown line)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1788 [inlined]
start_task at /cygdrive/c/buildbot/worker/package_win64/build/src\task.c:877
Allocations: 2721 (Pool: 2712; Big: 9); GC: 0
However it works good in pair with pointer_from_objref (which returns Ptr{Nothing}, so I think unsafe_pointer_to_objref works good only for Julia-owned objects
a === unsafe_pointer_to_objref(pointer_from_objref(a))
Finally it seems that PointerArithmetic.jl uses unsafe_load to dereference pointer, that is not the solution due to mentioned above reason.
So there are 2 questions:
Main: How to dereference pointer on a mutable struct and obtain object located in the place that pointer points on?
Not main: why does not unsafe_pointer_to_objref give a result and what for does it exist?
if you are not requiring a C interface for anything but memory allocation, but do not want the garbage collector to manage your structs, why not just preallocate your own heap?
Consider
struct MyStruct
i::Int
j::Float64
MyStruct(i = 0, j = 0.0) = new(0, 0.0)
end
const heapofMyStruct = [MyStruct() for _ in 1:1000]
m = heapofMyStruct[1]
You will need to keep track of where you are "allocating" the next item in your heap, but heapofMyStruct should not be garbage collected in use.
I want to get the data pointer of a string variable(like string::c_str() in c++) to pass to a c function and I found this doesn't work:
package main
/*
#include <stdio.h>
void Println(const char* str) {printf("%s\n", str);}
*/
import "C"
import (
"unsafe"
)
func main() {
s := "hello"
C.Println((*C.char)(unsafe.Pointer(&(s[0]))))
}
Compile error info is: 'cannot take the address of s[0]'.
This will be OK I but I doubt it will cause unneccesary memory apllying. Is there a better way to get the data pointer?
C.Println((*C.char)(unsafe.Pointer(&([]byte(s)[0]))))
There are ways to get the underlying data from a Go string to C without copying it. It will not work as a C string because it is not a C string. Your printf will not work even if you manage to extract the pointer even if it happens to work sometimes. Go strings are not C strings. They used to be for compatibility when Go used more libc, they aren't anymore.
Just follow the cgo manual and use C.CString. If you're fighting for efficiency you'll win much more by just not using cgo because the overhead of calling into C is much bigger than allocating some memory and copying a string.
(*reflect.StringHeader)(unsafe.Pointer(&sourceTail)).Data
Strings in go are not null terminated, therefore you should always pass the Data and the Len parameter to the corresponding C functions. There is a family of functions in the C standard library to deal with this type of strings, for example if you want to format them with printf, the format specifier is %.*s instead of %s and you have to pass both, the length and the pointer in the arguments list.
I have a code block that queries AD and retrive the results and write to a channel.
func GetFromAD(connect *ldap.Conn, ADBaseDN, ADFilter string, ADAttribute []string, ADPage uint32) *[]ADElement {
searchRequest := ldap.NewSearchRequest(ADBaseDN, ldap.ScopeWholeSubtree, ldap.NeverDerefAliases, 0, 0, false, ADFilter, ADAttribute, nil)
sr, err := connect.SearchWithPaging(searchRequest, ADPage)
CheckForError(err)
fmt.Println(len(sr.Entries))
ADElements := []ADElement{}
for _, entry := range sr.Entries{
NewADEntity := new(ADElement) //struct
NewADEntity.DN = entry.DN
for _, attrib := range entry.Attributes {
NewADEntity.attributes = append(NewADEntity.attributes, keyvalue{attrib.Name: attrib.Values})
}
ADElements = append(ADElements, *NewADEntity)
}
return &ADElements
}
The above function returns a pointer to []ADElements.
And in my initialrun function, I call this function like
ADElements := GetFromAD(connectAD, ADBaseDN, ADFilter, ADAttribute, uint32(ADPage))
fmt.Println(reflect.TypeOf(ADElements))
ADElementsChan <- ADElements
And the output says
*[]somemodules.ADElement
as the output of reflect.TypeOf.
My doubt here is,
since ADElements := []ADElement{} defined in GetFromAD() is a local variable, it must be allocated in the stack, and when GetFromAD() exits, contents of the stack must be destroyed, and further references to GetFromAD() must be pointing to invalid memory references, whereas I still am getting the exact number of elements returned by GetFromAD() without any segfault. How is this working? Is it safe to do it this way?
Yes, it is safe because Go compiler performs escape analysis and allocates such variables on heap.
Check out FAQ - How do I know whether a variable is allocated on the heap or the stack?
The storage location does have an effect on writing efficient programs. When possible, the Go compilers will allocate variables that are local to a function in that function's stack frame. However, if the compiler cannot prove that the variable is not referenced after the function returns, then the compiler must allocate the variable on the garbage-collected heap to avoid dangling pointer errors. Also, if a local variable is very large, it might make more sense to store it on the heap rather than the stack.
Define "safe"...
You will not end up freeing the memory of ADElements, since there's at least one live reference to it.
In this case, you should be completely safe, since you're only passing the slice once and then you seem to not modify it, but in the general case it might be better to pass it element-by-element across a chan ADElement, to avoid multiple unsynchronized accesses to the slice (or, more specifically, the array backing the slice).
This also holds for maps, where you can get curious problems if you pass a map over a channel, then continue to access it.
My goal is to call Windows' GetModuleInformation function to get a MODULEINFO struct back. This is all working fine. The problem comes as a result of me wanting to do pointer arithmetic and dereferences on the LPVOID lpBaseOfDll which is part of the MODULEINFO.
Here is my code to call the function in Lua:
require "luarocks.require"
require "alien"
sizeofMODULEINFO = 12 --Gotten from sizeof(MODULEINFO) from Visual Studio
MODULEINFO = alien.defstruct{
{"lpBaseOfDll", "pointer"}; --Does this need to be a buffer? If so, how?
{"SizeOfImage", "ulong"};
{"EntryPoint", "pointer"};
}
local GetModuleInformation = alien.Kernel32.K32GetModuleInformation
GetModuleInformation:types{ret = "int", abi = "stdcall", "long", "pointer", "pointer", "ulong"}
local GetModuleHandle = alien.Kernel32.GetModuleHandleA
GetModuleHandle:types{ret = "pointer", abi = "stdcall", "pointer"}
local GetCurrentProcess = alien.Kernel32.GetCurrentProcess
GetCurrentProcess:types{ret = "long", abi = "stdcall"}
local mod = MODULEINFO:new() --Create struct (needs buffer?)
local currentProcess = GetCurrentProcess()
local moduleHandle = GetModuleHandle("myModule.dll")
local success = GetModuleInformation(currentProcess, moduleHandle, mod(), sizeofMODULEINFO)
if success == 0 then --If there is an error, exit
return 0
end
local dataPtr = mod.lpBaseOfDll
--Now how do I do pointer arithmetic and/or dereference "dataPtr"?
At this point, mod.SizeOfImage seems to be giving me the correct values that I am expecting, so I know the functions are being called and the struct is being populated. However, I cannot seem to do pointer arithmetic on mod.lpBaseOfDll because it is a UserData.
The only information in the Alien Documentation that may address what I'm trying to do are these:
Pointer Unpacking
Alien also provides three convenience functions that let you
dereference a pointer and convert the value to a Lua type:
alien.tostring takes a userdata (usually returned from a function that has a pointer return value), casts it to char*, and returns a Lua
string. You can supply an optional size argument (if you don’t Alien
calls strlen on the buffer first).
alien.toint takes a userdata, casts it to int*, dereferences it and returns it as a number. If you pass it a number it assumes the
userdata is an array with this number of elements.
alien.toshort, alien.tolong, alien.tofloat, and alien.todouble are like alien.toint, but works with with the respective typecasts.
Unsigned versions are also available.
My issue with those, is I would need to go byte-by-byte, and there is no alien.tochar function. Also, and more importantly, this still doesn't solve the problem of me being able to get elements outside of the base address.
Buffers
After making a buffer you can pass it in place of any argument of
string or pointer type.
...
You can also pass a buffer or other userdata to the new method of your
struct type, and in this case this will be the backing store of the
struct instance you are creating. This is useful for unpacking a
foreign struct that a C function returned.
These seem to suggest I can use an alien.buffer as the argument of MODULEINFO's LPVOID lpBaseOfDll. And buffers are described as byte arrays, which can be indexed using this notation: buf[1], buf[2], etc. Additionally, buffers go by bytes, so this would ideally solve all problems. (If I am understanding this correctly).
Unfortunately, I can not find any examples of this anywhere (not in the docs, stackoverflow, Google, etc), so I am have no idea how to do this. I've tried a few variations of syntax, but nearly every one gives a runtime error (others simply does not work as expected).
Any insight on how I might be able to go byte-by-byte (C char-by-char) across the mod.lpBaseOfDll through dereferences and pointer arithmetic?
I need to go byte-by-byte, and there is no alien.tochar function.
Sounds like alien.tostring has you covered:
alien.tostring takes a userdata (usually returned from a function that has a pointer return value), casts it to char*, and returns a Lua string. You can supply an optional size argument (if you don’t Alien calls strlen on the buffer first).
Lua strings can contain arbitrary byte values, including 0 (i.e. they aren't null-terminated like C strings), so as long as you pass a size argument to alien.tostring you can get back data as a byte buffer, aka Lua string, and do whatever you please with the bytes.
It sounds like you can't tell it to start at an arbitrary offset from the given pointer address. The easiest way to tell for sure, if the documentation doesn't tell you, is to look at the source. It would probably be trivial to add an offset parameter.