Where is the implementation of the function `_PyIO_str_readline` on the C Python implementation? - cpython

On the question, How is file implemented? I learned how open() method is implemented, but I cannot find where the _PyIO_str_readline function used on its implemented is defined.
https://docs.python.org/3/c-api/object.html
https://github.com/python/cpython/search?q=_PyIO_str_readline

Your problem is that you think, _PyIO_str_readline is a function, but actually it is just a global variable (of type PyObject *), which is declared here:
extern PyObject *_PyIO_str_readline;
and defined here:
PyObject *_PyIO_str_readline = NULL;
to be NULL, but as name suggest could by any string-object (i.e. unicode in Python3 or bytes in Python2).
_PyIO_str_readline is a kind of a cache (often referenced as "interned string" in CPython - see PyUnicode_InternFromString), so every time PyObject_CallMethodObjArgs is called with "readline" as method-name, the corresponding object must not be constructed anew.
_PyIO_str_readline is initialized in PyInit__io to its actual value, using macro ADD_INTERNED:
/* Interned strings */
#define ADD_INTERNED(name) \
if (!_PyIO_str_ ## name && \
!(_PyIO_str_ ## name = PyUnicode_InternFromString(# name))) \
goto fail;
...
ADD_INTERNED(readline)
..
i.e. _PyIO_str_readline is an unicode-object with value readline. Which readline-method is actually used, is resolved during the run time and depends on what self actually is.

Related

How to convert OCaml signal to POSIX signal or string?

I run a subprocess from an OCaml program and check its termination status. If it exited normally (WEXITED int), I get the expected return code (0 usually indicating success).
However, if it was terminated by a signal (WSIGNALED int), I don't get the proper POSIX signal number. Instead, I get some (negative) OCaml specific signal number.
How do I convert this nonstandard signal number to a proper POSIX signal number, for proper error reports? Alternatively, how do I convert this number to a string?
(I'm aware that there are tons of named integer values like Sys.sigabrt, but do I really have to write that large match statement myself? Moreover, I don't get why they didn't use a proper variant type in the first place, given that those signal numbers are OCaml specific anyway.)
There is a function in the OCaml runtime that does this conversion (naturally). It is not kosher to call this function, but if you don't mind writing code that can break in future releases of OCaml (and other possibly bad outcomes), here is code that works for me:
A wrapper for the OCaml runtime function:
$ cat wrap.c
#include <caml/mlvalues.h>
extern int caml_convert_signal_number(int);
value oc_sig_to_host_sig(value ocsignum)
{
/* Convert a signal number from OCaml to host system.
*/
return Val_int(caml_convert_signal_number(Int_val(ocsignum)));
}
A test program.
$ cat m.ml
external convert : int -> int = "oc_sig_to_host_sig"
let main () =
Printf.printf "converted %d -> %d\n" Sys.sigint (convert Sys.sigint)
let () = main ()
Compile the program and try it out:
$ ocamlopt -o m -I $(ocamlopt -where) wrap.c m.ml
$ ./m
converted -6 -> 2
All in all, it might be better just to write some code that compares against the different signals defined in the Sys module and translates them to strings.

Cython struct member scope

I have a data structure in Cython that uses a char * member.
What is happening is that the member value seems to lose its scope outside of a function that assigns a value to the member. See this example (using IPython):
[nav] In [26]: %%cython -f
...: ctypedef struct A:
...: char *s
...:
...: cdef char *_s
...:
...: cdef void fn(A *a, msg):
...: s = msg.encode()
...: a[0].s = s
...:
...: cdef A _a
...: _a.s = _s
...: fn(&_a, 'hello')
...: print(_a.s)
...: print(b'hola')
...: print(_a.s)
b'hello'
b'hola'
b"b'hola'"
It looks like _a.s is deallocated outside of fn and is being assigned any junk that is in memory that fits the slot.
This happens only under certain circumstances. For example, if I assign b'hello' to s instead of the encoded string inside fn(), the correct string is printed outside of the function.
As you can see, I also added an extra declaration for the char variable and assigned it to the struct before executing fn, to make sure that the _a.s pointer does not get out of scope. However, my suspect is that the problem is assigning the member to a variable that is in the function scope.
What is really happening here, and how do I resolve this issue?
Thanks.
Your problem is, that the pointer a.s becomes dangling in the fn-function as soon as it is created.
When calling msg.encode() the temporary byte-object s is created and the address of its buffer is saved to a.s. However, directly afterwards (i.e. at the exit from the function) the temporary bytes-object gets destroyed and the pointer becomes dangling.
Because the bytes object was small, Python's memory manager manages its memory in the arena - which guaranties that there is no segfault when you access the address (lucky you).
While the temporary object is destroyed, the memory isn't overwritten/sanatized and thus it looks as if the temporary object where still alive from A.s's point of view.
Whenever you create a new bytes-object similar in size to the temporary object, the old memory from the arena might get reused, so that your pointer a.s could point to the buffer of the newly allocated bytes-object.
Btw, would you use a[0].s = msg.encode() directly (and I guess you did), the Cython would not build and tell you, that you try to say reference to a temporary Python object. Adding an explicit reference fooled the Cython, but didn't help your case.
So what to do about it? Which solution is appropriate depends on the bigger picture, but the available strategies are:
Manage the memory of A.s. I.e. manually reserve memory, copy from the temporary object, free memory as soon as done.
Manage reference counting: Add a PyObject * to the A-struct. Assign the temporary object ot it (don't forget to increase the reference counter manually), decrease reference counter as soon as done.
Collect references of temporary objects into a pool (e.g. a list), which would keep them alive. Clear the pool as soon as objects aren't needed.
Not always the best, but easiest is the option 3 - you neither have to manage the memory not the reference counting:
%%cython
...
pool=[]
cdef void fn(A *a, msg):
s = msg.encode()
pool.append(s)
a[0].s = s
While this doesn't solve the principal problem, using PyUnicode_AsUTF8 (inspired by this answer) might be a satisfactory solution in this case:
%%cython
# it is not wrapped by `cpython.pxd`:
cdef extern from "Python.h":
const char* PyUnicode_AsUTF8(object unicode)
...
cdef void fn(A *a, msg):
a[0].s = PyUnicode_AsUTF8(msg) # msg.encode() uses utf-8 as default.
This has at least two advantages:
the pointer a[0].s is valid as long as msg is alive
calling PyUnicode_AsUTF8(msg) is faster than msg.encode(), because it reuses cached buffer, so it basically O(1) after first call, while msg.encode() needs at least copy the memory and is O(n) with n-number of characters.

How to pass a vector as parameter to a c++ library from a CLI/C++ Wrapper?

I have found similar questions but none that worked for my situation, so I am asking my own.
I want to use a library function that takes a pointer to a std::vector, and fills it with data.
I already have a C++/CLI Wrapper set up.
I am currently trying to instantiate the vector in the wrapper,
private:
std::vector<int>* outputVector
and in the constructor, I instantiate it :
outputVector = new std::vector<int>();
Now, in the wrapper method that calls the c++ library function :
m_pUnmanagedTPRTreeClass->GetInRegion(..., &outputVector)
I omitted the other parameters because they dont matter for this case. I can already use other functions of the library and they work without a problem. I just can't manage to pass a pointer to a std::vector.
With the code like this, I get the error message :
error C2664: 'TPSimpleRTree<CT,T>::GetInRegion' : cannot convert parameter 3 from 'cli::interior_ptr<Type>' to 'std::vector<_Ty> &'
I have tried removing the "&", as I am not great at C++ and am unsure how to correctly use pointers. Then, the error becomes :
error C2664: 'TPSimpleRTree<CT,T>::GetInRegion' : cannot convert parameter 3 from 'std::vector<_Ty> *' to 'std::vector<_Ty> &'
EDIT: I have tried replacing "&" by "*", it does not work, I get the error :
cannot convert from 'std::vector<_Ty>' to 'std::vector<_Ty> &'
The signature of the c++ function for the vector is so :
GetInRegion(..., std::vector<T*>& a_objects)
Given the signature:
GetInRegion(..., std::vector<T*>& a_objects)
You would call this (in C++ or C++/CLI) like:
std::vector<int*> v;
m_pUnmanagedTPRTreeClass->GetInRegion(..., v);
Then you can manipulate the data as needed or marshall the data into a .Net container.
'std::vector<_Ty> *' to 'std::vector<_Ty> &'
is self explanatory, you need to dereference instead of taking a pointer, so instead of:
m_pUnmanagedTPRTreeClass->GetInRegion(..., &outputVector)
use:
m_pUnmanagedTPRTreeClass->GetInRegion(..., *outputVector)
^~~~~~~!!
after your edit I see your getinregion signature is:
GetInRegion(..., std::vector<T*>& a_objects)
so it accepts std::vector where T is a pointer, while you want to pass to getinregion a std::vector where int is not a pointer.

Lua Alien - Pointer Arithmetic and Dereferencing

My goal is to call Windows' GetModuleInformation function to get a MODULEINFO struct back. This is all working fine. The problem comes as a result of me wanting to do pointer arithmetic and dereferences on the LPVOID lpBaseOfDll which is part of the MODULEINFO.
Here is my code to call the function in Lua:
require "luarocks.require"
require "alien"
sizeofMODULEINFO = 12 --Gotten from sizeof(MODULEINFO) from Visual Studio
MODULEINFO = alien.defstruct{
{"lpBaseOfDll", "pointer"}; --Does this need to be a buffer? If so, how?
{"SizeOfImage", "ulong"};
{"EntryPoint", "pointer"};
}
local GetModuleInformation = alien.Kernel32.K32GetModuleInformation
GetModuleInformation:types{ret = "int", abi = "stdcall", "long", "pointer", "pointer", "ulong"}
local GetModuleHandle = alien.Kernel32.GetModuleHandleA
GetModuleHandle:types{ret = "pointer", abi = "stdcall", "pointer"}
local GetCurrentProcess = alien.Kernel32.GetCurrentProcess
GetCurrentProcess:types{ret = "long", abi = "stdcall"}
local mod = MODULEINFO:new() --Create struct (needs buffer?)
local currentProcess = GetCurrentProcess()
local moduleHandle = GetModuleHandle("myModule.dll")
local success = GetModuleInformation(currentProcess, moduleHandle, mod(), sizeofMODULEINFO)
if success == 0 then --If there is an error, exit
return 0
end
local dataPtr = mod.lpBaseOfDll
--Now how do I do pointer arithmetic and/or dereference "dataPtr"?
At this point, mod.SizeOfImage seems to be giving me the correct values that I am expecting, so I know the functions are being called and the struct is being populated. However, I cannot seem to do pointer arithmetic on mod.lpBaseOfDll because it is a UserData.
The only information in the Alien Documentation that may address what I'm trying to do are these:
Pointer Unpacking
Alien also provides three convenience functions that let you
dereference a pointer and convert the value to a Lua type:
alien.tostring takes a userdata (usually returned from a function that has a pointer return value), casts it to char*, and returns a Lua
string. You can supply an optional size argument (if you don’t Alien
calls strlen on the buffer first).
alien.toint takes a userdata, casts it to int*, dereferences it and returns it as a number. If you pass it a number it assumes the
userdata is an array with this number of elements.
alien.toshort, alien.tolong, alien.tofloat, and alien.todouble are like alien.toint, but works with with the respective typecasts.
Unsigned versions are also available.
My issue with those, is I would need to go byte-by-byte, and there is no alien.tochar function. Also, and more importantly, this still doesn't solve the problem of me being able to get elements outside of the base address.
Buffers
After making a buffer you can pass it in place of any argument of
string or pointer type.
...
You can also pass a buffer or other userdata to the new method of your
struct type, and in this case this will be the backing store of the
struct instance you are creating. This is useful for unpacking a
foreign struct that a C function returned.
These seem to suggest I can use an alien.buffer as the argument of MODULEINFO's LPVOID lpBaseOfDll. And buffers are described as byte arrays, which can be indexed using this notation: buf[1], buf[2], etc. Additionally, buffers go by bytes, so this would ideally solve all problems. (If I am understanding this correctly).
Unfortunately, I can not find any examples of this anywhere (not in the docs, stackoverflow, Google, etc), so I am have no idea how to do this. I've tried a few variations of syntax, but nearly every one gives a runtime error (others simply does not work as expected).
Any insight on how I might be able to go byte-by-byte (C char-by-char) across the mod.lpBaseOfDll through dereferences and pointer arithmetic?
I need to go byte-by-byte, and there is no alien.tochar function.
Sounds like alien.tostring has you covered:
alien.tostring takes a userdata (usually returned from a function that has a pointer return value), casts it to char*, and returns a Lua string. You can supply an optional size argument (if you don’t Alien calls strlen on the buffer first).
Lua strings can contain arbitrary byte values, including 0 (i.e. they aren't null-terminated like C strings), so as long as you pass a size argument to alien.tostring you can get back data as a byte buffer, aka Lua string, and do whatever you please with the bytes.
It sounds like you can't tell it to start at an arbitrary offset from the given pointer address. The easiest way to tell for sure, if the documentation doesn't tell you, is to look at the source. It would probably be trivial to add an offset parameter.

GCC, linker-script: Variables that resolve to manually defined addresses?

I'll use a simple specific example to illustrate what I'm trying to do.
file main.c:
#include <stdio.h>
unsigned int X;
int main()
{
printf("&X = 0x%zX\r\n", &X);
return 0;
}
I want to know if it's possible (using a linker-script/gcc options) to manually specify an address for X at compile/link time, because I know it lies somewhere in memory, outside my executable.
I only want to know if this is possible, I know I can use a pointer (i.e. unsigned int*) to access a specific memory location (r/w) but that's not what I'm after.
What I'm after is making GCC generate code in which all accesses to global variables/static function variables are either done through a level of indirection, i.e. through a pointer (-fPIC not good enough because static global vars are not accessed via GOT) or their addresses can be manually specified (at link/compile time).
Thank you
What I'm after is making GCC generate code in which all accesses to
global variables/static function variables … their addresses can be
manually specified (at link/compile time).
You can specify the addresses of the .bss and .data sections (which contain the uninitialized and initialized variables respectively) with linker commands. The relative placement of the variables in the sections is up to the compiler/linker.
If you need only individual variables to be placed, this can be done by declaring them extern and specifying their addresses in a file, e. g. addresses.ld:
X = 0x12345678;
(note: spaces around = needed), which is added to the compiler/linker arguments:
cc main.c addresses.ld

Resources