Removing Python overhead when wrapping C++ vectors - vector

from libcpp.algorithm cimport sort as stdsort
from libcpp.algorithm cimport unique
from libcpp.vector cimport vector
# from libcpp cimport bool
cimport cython
#cython.boundscheck(False)
#cython.wraparound(False)
#cython.initializedcheck(False)
cdef class Vector:
cdef vector[cython.int] wrapped_vector
# the easiest thing to do is add short wrappers for the methods you need
def push_back(self, int num):
self.wrapped_vector.push_back(num)
def sort(self):
stdsort(self.wrapped_vector.begin(), self.wrapped_vector.end())
def unique(self):
self.wrapped_vector.erase(unique(self.wrapped_vector.begin(), self.wrapped_vector.end()), self.wrapped_vector.end())
def __str__(self):
return "[" + ", ".join([str(i) for i in self.wrapped_vector]) + "]"
def __repr__(self):
return str(self)
def __len__(self):
return self.wrapped_vector.size()
#cython.boundscheck(False)
#cython.wraparound(False)
#cython.initializedcheck(False)
def __setitem__(self, int key, int item):
self.wrapped_vector[key] = item
#cython.boundscheck(False)
#cython.wraparound(False)
#cython.initializedcheck(False)
def __getitem__(self, int key):
return self.wrapped_vector[key]
I have tried to wrap vectors so that I can use them in Python dicts.
This seems to create crazy amounts of overhead. See line 72 and 75 for example. They just add an integer to the number already in the vector:
Is it possible to remove this overhead or is this the price I pay to wrap vectors?

This seems to be based on my answer to another question. The purpose of adding __getitem__ and __setitem__ to the cdef class Vector is purely so that it can be indexed from Python. From Cython you can index into the C++ vector directly for extra speed.
At the start of your files_to_bins add the line:
cdef Vector v
This will get Cython to make sure that anything assigned to v is a Vector object (it'll raise a TypeError if not) and thus you'll be allowed to access its cdef attributes directly.
Then change the line:
v[i] = v[i] + half_fragment_size
to:
v.wrapped_vector[i] = v.wrapped_vector[i] + half_fragment_size
(and similarly for the other indexing lines)
Be aware that boundscheck(False) and wraparound(False) is doing absolutely nothing for C++ objects. The C++ indexing operator performs no bounds checking (and Cython doesn't add it in) and it does not support negative indexing either. boundscheck and wraparound only apply to indexing memoryviews or numpy arrays.

Related

Suppress empty result from 'many' in 'seq(p, many(p))' construct with parser combinators

I'm trying to build parser combinators following Hutton and Meijer, "Monadic Parser Combinators". My implementation is in PostScript, but I think my issue is general to combinator parsers and not my specific implementation.
As a small exercise, I'm using the parsers to recognize regular expressions.
(pc9.ps)run
/Dot (.) char def
/Meta (*+?) anyof def
/Character (*+?.|()) noneof def
/Atom //Dot
//Character plus def
/Factor //Atom //Meta maybe seq def
/Term //Factor //Factor many seq def
/Expression //Term (|) char //Term xthen many seq def
/regex { string-input //Expression exec ps } def
(abc|def|ghi) regex
quit
It's working, but the output has lots of [] empty arrays that really get in the way when I try to bind handlers to process the values.
$ gsnd -q -dNOSAFER pc9re2.ps
stack:
[[[[[97 []] [[98 []] [[99 []] []]]] [[[100 []] [[101 []] [[102 []]
[]]]] [[[103 []] [[104 []] [[105 []] []]]] []]]] null]]
These are happening whenever a seq sequencing combinator accepts the result from maybe or many (which uses maybe) that had zero occurrences.
What is the normal way of excluding this extra noise in the output with Parser Combinators?
github repo
Sigh. It's seems I can just implement around it. I added special code in seq to detect an empty right-hand-side and just discard it. On to other problems...
Edit: I encountered the same problem again in version 11 (and a half). Now I've got a better solution IMO:
https://groups.google.com/g/comp.lang.functional/c/MbJxrJSk8Mw/m/MoT3Dr0IAwAJ
Ugh. I think it wasn't even an X/Y problem. It was a "doctor it hurts
when I move my arm like this; ... so don't move your arm like that"
problem.
I want the "result" part of the "reply" structure (using new terms
following usage from the Parsec document) to be any of the /usual/
PostScript types: integer, real, string, boolean, array, dictionary.
But I also need some way to arbitrarily combine or concatenate two
objects regardless of type. My then (aka seq) combinator needs to
do this. So I made a hack-y function that does the combining. If it
has two arrays, it composes the contents into a longer array. If it
has one array and some other object it extends the array by one and
stuffs the object in the front or back as appropriate. If it has two
non-array objects it makes a new 2-element array to contain them.
So, instead of building xthen and thenx off of then and needing
to cons, car, and cdr the stuff, I can write all 3 of these as a more
general parameterized function.
sequence{ p q u }{
{ /p exec +is-ok {
next x-xs force /q exec +is-ok {
next x-xs 3 1 roll /u exec exch consok
}{
x-xs 3 2 roll ( after ) exch cons exch cons cons
} ifelse
} if } ll } #func
then { {append} sequence }
xthen { {exch pop} sequence }
thenx { {pop} sequence }
append { 1 index zero eq { exch pop }{
dup zero eq { pop }{
1 index type /arraytype eq {
dup type /arraytype eq { compose }{ one compose } ifelse
}{ dup type /arraytype eq { curry }{ cons } ifelse } ifelse } ifelse } ifelse }
(#func is my own non-standard extension to PostScript that takes a
procedure body and list of parameters and wraps the procedure with
code that defines the arguments in a local dictionary. ll is my
hack-y PostScript way of making lambdas with hard-patched parameters,
it's short for load all literals.)
The code also treats executable arrays (ie. PostScript procedures) as
a non-array for the purpose of combining sequences of results. This allows
the parser to be used as a syntax-directed compiler producing procedures
as output.

Python PEP 484 Type Hints -> return type either class name or None?

I'm using Python 3.6.5.
Class A, below for me represents a database table, using SQLAlchemy.
I'm defining a #staticmethod method that returns a row, but if there's no result, it would return None.
Since it returns an instance of class A, then the notation normally goes:
-> A:
at the end of the def signature, but because A is not yet defined, as it's on class A itself, you are supposed to quote it as:
-> 'A':
Is the -> 'A': sufficient?
Or is there some sort of OR syntax?
Thanks in advance for your advice.
You can use Optional[A], this means that it can return A or None
To make a "or" between classes A and B, use Union[A, B]
Note that you should import Optional and Union from typing

Extract nth element of a tuple

For a list, you can do pattern matching and iterate until the nth element, but for a tuple, how would you grab the nth element?
TL;DR; Stop trying to access directly the n-th element of a t-uple and use a record or an array as they allow random access.
You can grab the n-th element by unpacking the t-uple with value deconstruction, either by a let construct, a match construct or a function definition:
let ivuple = (5, 2, 1, 1)
let squared_sum_let =
let (a,b,c,d) = ivuple in
a*a + b*b + c*c + d*d
let squared_sum_match =
match ivuple with (a,b,c,d) -> a*a + b*b + c*c + d*d
let squared_sum_fun (a,b,c,d) =
a*a + b*b + c*c + d*d
The match-construct has here no virtue over the let-construct, it is just included for the sake of completeness.
Do not use t-uples, Don¹
There are only a few cases where using t-uples to represent a type is the right thing to do. Most of the times, we pick a t-uple because we are too lazy to define a type and we should interpret the problem of accessing the n-th field of a t-uple or iterating over the fields of a t-uple as a serious signal that it is time to switch to a proper type.
There are two natural replacements to t-uples: records and arrays.
When to use records
We can see a record as a t-uple whose entries are labelled; as such, they are definitely the most natural replacement to t-uples if we want to access them directly.
type ivuple = {
a: int;
b: int;
c: int;
d: int;
}
We then access directly the field a of a value x of type ivuple by writing x.a. Note that records are easily copied with modifications, as in let y = { x with d = 0 }. There is no natural way to iterate over the fields of a record, mostly because a record do not need to be homogeneous.
When to use arrays
A large² homogeneous collection of values is adequately represented by an array, which allows direct access, iterating and folding. A possible inconvenience is that the size of an array is not part of its type, but for arrays of fixed size, this is easily circumvented by introducing a private type — or even an abstract type. I described an example of this technique in my answer to the question “OCaml compiler check for vector lengths”.
Note on float boxing
When using floats in t-uples, in records containing only floats and in arrays, these are unboxed. We should therefore not notice any performance modification when changing from one type to the other in our numeric computations.
¹ See the TeXbook.
² Large starts near 4.
Since the length of OCaml tuples is part of the type and hence known (and fixed) at compile time, you get the n-th item by straightforward pattern matching on the tuple. For the same reason, the problem of extracting the n-th element of an "arbitrary-length tuple" cannot occur in practice - such a "tuple" cannot be expressed in OCaml's type system.
You might still not want to write out a pattern every time you need to project a tuple, and nothing prevents you from generating the functions get_1_1...get_i_j... that extract the i-th element from a j-tuple for any possible combination of i and j occuring in your code, e.g.
let get_1_1 (a) = a
let get_1_2 (a,_) = a
let get_2_2 (_,a) = a
let get_1_3 (a,_,_) = a
let get_2_3 (_,a,_) = a
...
Not necessarily pretty, but possible.
Note: Previously I had claimed that OCaml tuples can have at most length 255 and you can simply generate all possible tuple projections once and for all. As #Virgile pointed out in the comments, this is incorrect - tuples can be huge. This means that it is impractical to generate all possible tuple projection functions upfront, hence the restriction "occurring in your code" above.
It's not possible to write such a function in full generality in OCaml. One way to see this is to think about what type the function would have. There are two problems. First, each size of tuple is a different type. So you can't write a function that accesses elements of tuples of different sizes. The second problem is that different elements of a tuple can have different types. Lists don't have either of these problems, which is why you can have List.nth.
If you're willing to work with a fixed size tuple whose elements are all the same type, you can write a function as shown by #user2361830.
Update
If you really have collections of values of the same type that you want to access by index, you should probably be using an array.
here is a function wich return you the string of the ocaml function you need to do that ;) very helpful I use it frequently.
let tup len n =
if n>=0 && n<len then
let rec rep str nn = match nn<1 with
|true ->""
|_->str ^ (rep str (nn-1))in
let txt1 ="let t"^(string_of_int len)^"_"^(string_of_int n)^" tup = match tup with |" ^ (rep "_," n) ^ "a" and
txt2 =","^(rep "_," (len-n-2)) and
txt3 ="->a" in
if n = len-1 then
print_string (txt1^txt3)
else
print_string (txt1^txt2^"_"^txt3)
else raise (Failure "Error") ;;
For example:
tup 8 6;;
return:
let t8_6 tup = match tup with |_,_,_,_,_,_,a,_->a
and of course:
val t8_6 : 'a * 'b * 'c * 'd * 'e * 'f * 'g * 'h -> 'g = <fun>

pyparsing for querying a database of chemical elements

I would like to parse a query for a database of chemical elements.
The database is stored in a xml file. Parsing that file produces a nested dictionary that is stored in a singleton object that inherit from collections.OrderedDict.
Asking for an element will give me an ordered dictionary of its corresponding properties
(i.e. ELEMENTS['C'] --> {'name':'carbon','neutron' : 0,'proton':6, ...}).
Conversely, asking for a propery will give me an ordered dictionary of its values for all the elements (i.e. ELEMENTS['proton'] --> {'H' : 1, 'He' : 2} ...).
A typical query could be:
mass > 10 or (nucleon < 20 and atomic_radius < 5)
where each 'subquery' (i.e. mass > 10) will return the set of elements that matches it.
Then, the query will be converted and transformed internally to a string that will be evaluated further to produce a set of the indexes of the elements that matched it. In that context the operators and/or are not boolean operator but rather ensemble operator that acts upon python sets.
I recently sent a post for building such a query. Thanks to the useful answers I got, I think that I did more or less the job (I hope on a nice way !) but I still have some questions related to pyparsing.
Here is my code:
import numpy
from pyparsing import *
# This import a singleton object storing the datase dictionary as
# described earlier
from ElementsDatabase import ELEMENTS
and_operator = oneOf(['and','&'], caseless=True)
or_operator = oneOf(['or' ,'|'], caseless=True)
# ELEMENTS.properties is a property getter that returns the list of
# registered properties in the database
props = oneOf(ELEMENTS.properties, caseless=True)
# A property keyword can be quoted or not.
props = Suppress('"') + props + Suppress('"') | props
# When parsed, it must be replaced by the following expression that
# will be eval later.
props.setParseAction(lambda t : "numpy.array(ELEMENTS['%s'].values())" % t[0].lower())
quote = QuotedString('"')
integer = Regex(r'[+-]?\d+').setParseAction(lambda t:int(t[0]))
float_ = Regex(r'[+-]?(\d+(\.\d*)?)?([eE][+-]?\d+)?').setParseAction(lambda t:float(t[0]))
comparison_operator = oneOf(['==','!=','>','>=','<', '<='])
comparison_expr = props + comparison_operator + (quote | float_ | integer)
comparison_expr.setParseAction(lambda t : "set(numpy.where(%s)%s%s)" % tuple(t))
grammar = Combine(operatorPrecedence(comparison_expr, [(and_operator, 2, opAssoc.LEFT) (or_operator, 2, opAssoc.LEFT)]))
# A test query
res = grammar.parseString('"mass " > 30 or (nucleon == 1)',parseAll=True)
print eval(' '.join(res._asStringList()))
My question are the following:
1 using 'transformString' instead of 'parseString' never triggers any
exception even when the string to be parsed does not match the grammar.
However, it is exactly the functionnality I need. Is there is a way to do so ?
2 I would like to reintroduce white spaces between my tokens in order
that my eval does not fail. The only way I found to do so it the one
implemented above. Would you see a better way using pyparsing ?
sorry for the long post but I wanted to introduce in deeper details its context. BTW, if you find this approach bad, do not hesitate to tell it me!
thank you very much for your help.
Eric
do not worry about my concern, I found a work around. I used the SimpleBool.py example shipped with pyparsing (thanks for the hint Paul).
Basically, I used the following approach:
1 for each subquery (i.e. mass > 10), using the setParseAction method,
I joined a function that returns the set of eleements that matched
the subquery
2 then, I joined the following functions for each logical operator (and,
or and not):
def not_operator(token):
_, s = token[0]
# ELEMENTS is the singleton described in my original post
return set(ELEMENTS.keys()).difference(s)
def and_operator(token):
s1, _, s2 = token[0]
return (s1 and s2)
def or_operator(token):
s1, _, s2 = token[0]
return (s1 or s2)
# Thanks for Paul for the hint.
grammar = operatorPrecedence(comparison_expr,
[(not_token, 1,opAssoc.RIGHT,not_operator),
(and_token, 2, opAssoc.LEFT,and_operator),
(or_token, 2, opAssoc.LEFT,or_operator)])
Please not that these operators acts upon python sets rather than
on booleans.
And that does the job.
I hope that this approach will help anyone of you.
Eric

Recursive List Creation Function. Errors in type

I have an Ocaml function that is giving me errors.
What I am trying to do:
Recursively create a List of random numbers (0-2) of size "limit".
Here's what I have:
let rec carDoorNumbers = fun limit ->
match limit with
| [1] -> Random.int 3
| [] -> Random.int 3 :: carDoorNumbers (limit-1);;
I am getting this error:
Error: This expression has type 'a list
but an expression was expected of type int
Think about what your function has to do: given a limit, you have to create a list of numbers. So your type is something like carDoorNumbers : int -> int list.
Looking at that, it seems you have two errors. First, you're matching limit (which should be an int) against a list pattern. [1] -> ... matches a list containing only the element 1 and [] matches the empty list; you really want to match against the number 1 and any other number n.
The second error is that you return two different types in your match statement. Remember that you are supposed to be returning a list. In the first case, you are returning Random.int 3, which is an int rather than an int list. What you really want to return here is something like [Random.int 3].
The error you got is a little confusing. Since the first thing you returned was an int, it expects your second thing to also be an int. However, your second case was actually correct: you do return an int list! However, the compiler does not know what you meant, so its error is backwards; rather than changing the int list to an int, you need to change the int to an int list.
Your match expression treats limit like a list. Both [1] and [] are lists. That's what the compiler is telling you. But it seems limit should be an integer.
To match an integer, just use an integer constant. No square brackets.
(As a side comment, you might want to be sure the function works well when you pass it 0.)

Resources