pyparsing delimitedList(..., combine=True) giving inconsistent result - pyparsing

I'm using pyparsing==2.1.5 with Python 3.4, and I'm getting what seems to be an odd result:
word = Word(alphanums)
word_list_no_combine = delimitedList(word, combine=False)
word_list_combine = delimitedList(word, combine=True)
print(word_list_no_combine.parseString('one, two')) # ['one', 'two']
print(word_list_no_combine.parseString('one,two')) # ['one', 'two']
print(word_list_combine.parseString('one, two')) # ['one']: ODD ONE OUT
print(word_list_combine.parseString('one,two')) # ['one,two']
It's not obvious to me why the "combine" option causes one of the parts of the list to be swallowed when a space is present, but not when it's absent. Is this a pyparsing bug or am I missing something obvious?

Rather than modify pyparsing, I suggest you do this work using normal uncombined delimited list with a custom parse action:
word_list_combine_using_parse_action = word_list_no_combine.copy().setParseAction(','.join)
print(word_list_combine_using_parse_action.parseString('one, two'))
Will print one,two

Looks like it's due to the behaviour of Combine(), specifically its default "adjacent=True" option, which is then used by delimitedList():
class Combine(TokenConverter):
"""Converter to concatenate all matching tokens to a single string.
By default, the matching patterns must also be contiguous in the input string;
this can be disabled by specifying C{'adjacent=False'} in the constructor.
"""
def __init__( self, expr, joinString="", adjacent=True ):
# ...
def delimitedList( expr, delim=",", combine=False ):
# ...
dlName = _ustr(expr)+" ["+_ustr(delim)+" "+_ustr(expr)+"]..."
if combine:
return Combine( expr + ZeroOrMore( delim + expr ) ).setName(dlName)
else:
return ( expr + ZeroOrMore( Suppress( delim ) + expr ) ).setName(dlName)
So it can be solved with a replacement:
def delimitedListPlus(expr, delim=",", combine=False, combine_adjacent=False):
dlName = str(expr) + " [" + str(delim) + " " + str(expr) + "]..."
if combine:
return Combine(expr + ZeroOrMore(delim + expr),
adjacent=combine_adjacent).setName(dlName)
else:
return (expr + ZeroOrMore(Suppress(delim) + expr)).setName(dlName)

Related

pyparsing parse c/cpp enums with values as user defined macros

I have a usecase where i need to match enums where values can be userdefined macros.
Example enum
typedef enum
{
VAL_1 = -1
VAL_2 = 0,
VAL_3 = 0x10,
VAL_4 = **TEST_ENUM_CUSTOM(1,2)**,
}MyENUM;
I am using the below code, if i don't use format as in VAL_4 it works. I need match format as in VAL_4 as well. I am new to pyparsing, any help is appeciated.
My code:
BRACE, RBRACE, EQ, COMMA = map(Suppress, "{}=,")
_enum = Suppress("enum")
identifier = Word(alphas, alphanums + "_")
integer = Word("-"+alphanums) **#I have tried to "_(,)" to this but is not matching.**
enumValue = Group(identifier("name") + Optional(EQ + integer("value")))
enumList = Group(enumValue + ZeroOrMore(COMMA + enumValue) + Optional(COMMA))
enum = _enum + Optional(identifier("enum")) + LBRACE + enumList("names") + RBRACE + Optional(identifier("typedef"))
enum.ignore(cppStyleComment)
enum.ignore(cStyleComment)
Thanks
-Purna
Just adding more characters to integer is just the wrong way to go. Even this expression:
integer = Word("-"+alphanums)
isn't super-great, since it would match "---", "xyz", "q--10-", and many other non-integer strings.
Better to define integer properly. You could do:
integer = Combine(Optional('-') + Word(nums))
but I've found that for these low-level expressions that occur many places in your parse string, a Regex is best:
integer = Regex(r"-?\d+") # Regex(r"-?[0-9]+") if you like more readable re's
Then define one for hex_integer also,
Then to add macros, we need a recursive expression, to handle the possibility of macros having arguments that are also macros.
So at this point, we should just stop writing code for a bit, and do some design. In parser development, this design usually looks like a BNF, where you describe your parser in a sort of pseudocode:
enum_expr ::= "typedef" "enum" [identifier]
"{"
enum_item_list
"}" [identifier] ";"
enum_item_list ::= enum_item ["," enum_item]... [","]
enum_item ::= identifier "=" enum_value
enum_value ::= integer | hex_integer | macro_expression
macro_expression ::= identifier "(" enum_value ["," enum_value]... ")"
Note the recursion of macro_expression: it is used in defining enum_value, but it includes enum_value as part of its own definition. In pyparsing, we use a Forward to set up this kind of recursion.
See how that BNF is implemented in the code below. I build on some of the items you posted, but the macro expression required some rework. The bottom line is "don't just keep adding characters to integer trying to get something to work."
LBRACE, RBRACE, EQ, COMMA, LPAR, RPAR, SEMI = map(Suppress, "{}=,();")
_typedef = Keyword("typedef").suppress()
_enum = Keyword("enum").suppress()
identifier = Word(alphas, alphanums + "_")
# define an enumValue expression that is recursive, so that enumValues
# that are macros can take parameters that are enumValues
enumValue = Forward()
# add more types as needed - parse action on hex_integer will do parse-time
# conversion to int
integer = Regex(r"-?\d+").addParseAction(lambda t: int(t[0]))
# or just use the signed_integer expression found in pyparsing_common
# integer = pyparsing_common.signed_integer
hex_integer = Regex(r"0x[0-9a-fA-F]+").addParseAction(lambda t: int(t[0], 16))
# a macro defined using enumValue for parameters
macro_expr = Group(identifier + LPAR + Group(delimitedList(enumValue)) + RPAR)
# use '<<=' operator to attach recursive definition to enumValue
enumValue <<= hex_integer | integer | macro_expr
# remaining enum expressions
enumItem = Group(identifier("name") + Optional(EQ + enumValue("value")))
enumList = Group(delimitedList(enumItem) + Optional(COMMA))
enum = (_typedef
+ _enum
+ Optional(identifier("enum"))
+ LBRACE
+ enumList("names")
+ RBRACE
+ Optional(identifier("typedef"))
+ SEMI
)
# this comment style includes cStyleComment too, so no need to
# ignore both
enum.ignore(cppStyleComment)
Try it out:
enum.runTests([
"""
typedef enum
{
VAL_1 = -1,
VAL_2 = 0,
VAL_3 = 0x10,
VAL_4 = TEST_ENUM_CUSTOM(1,2)
}MyENUM;
""",
])
runTests is for testing and debugging your parser during development. Use enum.parseString(some_enum_expression) or enum.searchString(some_c_header_file_text) to get the actual parse results.
Using the new railroad diagram feature in the upcoming pyparsing 3.0 release, here is a visual representation of this parser:

building a nested pyparsing.Dict from a nested list

My input is growing from simple 2-level nested lists to a complex nested list of lists. I see where pyparsing.nestedExpr() is the bees' knees for this kind of thing, but I'm still wanting to build up a nested Dict structure.
With the basics somewhat squared away I've crafted this:
import pyparsing as pp
input_works = '''
(unitsOfMeasure
(altitudeUnits "m")
(capacitanceUnits "pF")
(designUnits "MIL")
(drawingUnits "MIL")
(drawingAccuracy 2)
(drawingHeight 28000)
)''
# recursive dict
input_doesnt_work = '''
(parameterFile "out.tf"
(revision "15.6")
(xcoord 1234.567)
(ycoord -3456.890)
(unitsOfMeasure
(altitudeUnits "m")
(capacitanceUnits "pF")
(designUnits "MIL")
(drawingUnits "MIL")
(drawingAccuracy 2)
(drawingHeight 28000)
)
)'''
v_string = pp.Word(pp.alphanums+'_'+'-'+'.')
v_quoted_string = pp.Combine( '"' + v_string + '"')
v_number = pp.Regex(r'[+-]?(?P<float1>\d+)(?P<float2>\.\d+)?(?P<float3>[Ee][+-]?\d+)?')
keyy = v_string
valu = pp.Or( [ v_string, v_quoted_string, v_number])
item = pp.Group( pp.Literal('(').suppress() + keyy + pp.OneOrMore( valu) + pp.Literal(')').suppress() )
# some magic - use Forward to make the dicts self-referential and thus recursive
dicts = pp.Forward()
dicts << pp.Group( pp.Literal('(').suppress() + \
keyy + \
pp.Optional( valu) + \
pp.OneOrMore( pp.Or( item, dicts)) + \
pp.Literal(')').suppress() )
print "dicts_input_works yields: ", dicts.parseString( input_works)
print "dicts_input_doesnt_work yields: ", dicts.parseString( input_doesnt_work
input_doesnt_work chokes on like 6, col 5, as if the self-reference in
pp.OneOrMore( pp.Or( item, dicts))
isn't being seen.
TIA,
code_warrior
Spotted my error 3sec after posting:
pp.OneOrMore( pp.Or( item, dicts )) + \ # wrong
pp.OneOrMore( pp.Or( [item, dicts] )) + \ # right
Never mind, nothing to see here, move along.
Thanks,
code_warrior

pyparsing: Grouping guidelines

pyparsing: The below is the code i put up which can parse a nested function call , a logical function call or a hybrid call which nests both the function and a logical function call. The dump() data adds too many unnecessary levels of braces because of grouping. Removing the Group() results in a wrong output. Is there a guideline to use Group(parsers)?
Also the Pyparsing document does'nt detail on how to walk the tree created and not much of data is available out there. Please point me to a link/guide which helps me write the tree walker for recursively parsed data for my test cases.
I will be translating this parsed data to a valid tcl code.
from pyparsing import *
from pyparsing import OneOrMore, Optional, Word, delimitedList, Suppress
# parse action -maker; # from Paul's example
def makeLRlike(numterms):
if numterms is None:
# None operator can only by binary op
initlen = 2
incr = 1
else:
initlen = {0:1,1:2,2:3,3:5}[numterms]
incr = {0:1,1:1,2:2,3:4}[numterms]
# define parse action for this number of terms,
# to convert flat list of tokens into nested list
def pa(s,l,t):
t = t[0]
if len(t) > initlen:
ret = ParseResults(t[:initlen])
i = initlen
while i < len(t):
ret = ParseResults([ret] + t[i:i+incr])
i += incr
return ParseResults([ret])
return pa
line = Forward()
fcall = Forward().setResultsName("fcall")
flogical = Forward()
lparen = Literal("(").suppress()
rparen = Literal(")").suppress()
arg = Word(alphas,alphanums+"_"+"."+"+"+"-"+"*"+"/")
args = delimitedList(arg).setResultsName("arg")
fargs = delimitedList(OneOrMore(flogical) | OneOrMore(fcall) |
OneOrMore(arg))
fname = Word(alphas,alphanums+"_")
fcall << Group(fname.setResultsName('func') + Group(lparen +
Optional(fargs) + rparen).setResultsName('fargs'))
flogic = Keyword("or") | Keyword("and") | Keyword("not")
logicalArg = delimitedList(Group(fcall.setResultsName("fcall")) |
Group(arg.setResultsName("arg")))
#logicalArg.setDebug()
flogical << Group(logicalArg.setResultsName('larg1') +
flogic.setResultsName('flogic') + logicalArg.setResultsName('larg2'))
#logical = operatorPrecedence(flogical, [(not, 1, opAssoc.RIGHT,
makeLRlike(2)),
# (and, 2, opAssoc.LEFT,
makeLRlike(2)),
# (or , 2, opAssoc.LEFT,
makeLRlike(2))])
line = flogical | fcall #change to logical if operatorPrecedence is used
# Works fine
print line.parseString("f(x, y)").dump()
print line.parseString("f(h())").dump()
print line.parseString("a and b").dump()
print line.parseString("f(a and b)").dump()
print line.parseString("f(g(x))").dump()
print line.parseString("f(a and b) or h(b not c)").dump()
print line.parseString("f(g(x), y)").dump()
print line.parseString("g(f1(x), a, b, f2(x,y, k(x,y)))").dump()
print line.parseString("f(a not c) and g(f1(x), a, b, f2(x,y,
k(x,y)))").dump()
#Does'nt work fine yet;
#try changing flogical assignment to logicalArg | flogic
#print line.parseString("a or b or c").dump()
#print line.parseString("f(a or b(x) or c)").dump()

Pyparsing: ParseAction not called

On a simple grammar I am in the bad situation that one of my ParseActions is not called.
For me this is strange as parseActions of a base symbol ("logic_oper") and a derived symbol ("cmd_line") are called correctly. Just "pa_logic_cmd" is not called. You can see this on the output which is included at the end of the code.
As there is no exception on parsing the input string, I am assuming that the grammar is (basically) correct.
import io, sys
import pyparsing as pp
def diag(msg, t):
print("%s: %s" % (msg , str(t)) )
def pa_logic_oper(t): diag('logic_oper', t)
def pa_operand(t): diag('operand', t)
def pa_ident(t): diag('ident', t)
def pa_logic_cmd(t): diag('>>>>>> logic_cmd', t)
def pa_cmd_line(t): diag('cmd_line', t)
def make_grammar():
semi = pp.Literal(';')
ident = pp.Word(pp.alphas, pp.alphanums).setParseAction(pa_ident)
operand = (ident).setParseAction(pa_operand)
op_and = pp.Keyword('A')
op_or = pp.Keyword('O')
logic_oper = (( op_and | op_or) + pp.Optional(operand))
logic_oper.setParseAction(pa_logic_oper)
logic_cmd = logic_oper + pp.Suppress(semi)
logic_cmd.setParseAction(pa_logic_cmd)
cmd_line = (logic_cmd)
cmd_line.setParseAction(pa_cmd_line)
grammar = pp.OneOrMore(cmd_line) + pp.StringEnd()
return grammar
if __name__ == "__main__":
inp_str = '''
A param1;
O param2;
A ;
'''
grammar = make_grammar()
print( "pp-version:" + pp.__version__)
parse_res = grammar.parseString( inp_str )
'''USAGE/Output: python test_4.py
pp-version:2.0.3
operand: ['param1']
logic_oper: ['A', 'param1']
cmd_line: ['A', 'param1']
operand: ['param2']
logic_oper: ['O', 'param2']
cmd_line: ['O', 'param2']
logic_oper: ['A']
cmd_line: ['A']
'''
Can anybody give me a hint on this parseAction problem?
Thanks,
The problem is here:
cmd_line = (logic_cmd)
cmd_line.setParseAction(pa_cmd_line)
The first line assigns cmd_line to be the same expression as logic_cmd. You can verify by adding this line:
print("???", cmd_line is logic_cmd)
Then the second line calls setParseAction, which overwrites the parse action of logic_cmd, so the pa_logic_cmd will never get called.
Remove the second line, since you are already testing the calling of the parse action with pa_logic_cmd. You could change to using the addParseAction method instead, but to my mind that is an invalid test (adding 2 parse actions to the same pyparsing expression object).
Or, change the definition of cmd_line to:
cmd_line = pp.Group(logic_cmd)
Now you will have wrapped logic_cmd inside another expression, and you can then independently set and test the running of parse actions on the two different expressions.

How to avoid "invalid lexical value" while adding empty nodes

let $a := <product>
<p1>100</p1>
<p2>100</p2>
<p3/>
</product>
for $i in $a
return $i/p1 + $i/p2 + $i/p3
Why do i get invalid lexical value here when i expect the sum to be displayed?
Your last line return $i/p1 + $i/p2 + $i/p3 is evaluated as return xs:double($i/p1) + xs:double($i/p2) + xs:double($i/p3). This works for p1 and p2, but not p3:
xs:double($i/p3) = xs:double(<p3/>)
= xs:double(xs:untypedAtomic('')) (: atomization :)
= xs:double('')
Since + returns the empty sequence () if one of its arguments is the empty sequence, your approach would not have worked either way. You can use fn:sum($sequence) instead, summing over the text nodes inside the elements:
let $a :=
<product>
<p1>100</p1>
<p2>100</p2>
<p3/>
</product>
for $i in $a
return sum(($i/p1/text(), $i/p2/text(), $i/p3/text()))
The last line can even be shortened to:
return sum($i/(p1, p2, p3)/text())

Resources